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Abstract. 

Fast Marching and Fast Sweeping are the two most commonly used methods for solving the Eikonal equation. 
Each of these methods performs best on a different set of problems. Fast Sweeping, for example, will outperform 
Fast Marching on problems where the characteristics are largely straight lines. Fast Marching, on the other hand, is 
usually more efficient than Fast Sweeping on problems where characteristics frequently change their directions and 
T~H on domains with complicated geometry. In this paper we explore the possibility of combining the best features of 

7— H both of these approaches, by using marching on a coarser scale and sweeping on a finer scale. We present three new 

f**^ hybrid methods based on this idea and illustrate their properties on several numerical examples with continuous and 

piecewise-constant speed functions in R^. 

> 

1. Introduction. Static Eikonal PDEs arise in a surprisingly wide range of applications: from 
'w' robotic path planning, to isotropic optimal control, to isotropic front propagation, to shape-from- 

QQ shading computations; see [3 8) and references therein for a detailed description. As a result, efficient 

CN| numerical methods for Eikonal PDEs are of interest to many practitioners and numerical analysts. 

In this paper we introduce two hybrid methods intended to blend and combine the best properties 
of the most popular current approaches (Fast Marching and Fast Sweeping). 
These methods are built to solve the non-linear boundary value probleirj^ 

^ \\7u{x)\F{x) = 1, on O C i?^; 

^ u{x) = q{x), on dft. (1.1) 



> 
o 



A discretized version of equation (1.1) is posed at every gridpoint, using upwind divided differ- 
ences to approximate the partial derivatives of u. The exact form of this discretization is introduced 
in section [2j here we simply note that these discretized equations form a system of M coupled non- 
linear equations (where M is the number of gridpoints) and that the key challenge addressed by 
many "fast" methods is the need to solve this system efficiently. Of course, an iterative approach 
is certainly a possibility, but its most straightforward and naive implementation typically leads to 
\^ 0{M^) algorithmic complexity for Eikonal PDE (and potentially much worse for its anisotropic gen- 

eralizations). This is in contrast to the "fast" methods, whose worst-case computational complexity 
^ is 0{M) or 0{M log M). 

Interestingly, most fast Eikonal-solvers currently in use are directly related to the fast algorithms 
" ^ \ developed much earlier to find the shortest paths in directed graphs with non-negative edge-lengths; 

^ see, e.g., [T], [5112]. Two such algorithmic families are particularly prominent: the label-setting meth- 

orfs, which have the optimal worst-case asymptotic computational complexity, and the label- correcting 
^'y^ methods, whose worst-case asymptotic complexity is not as good, but the practical performance is 

at times even better than that of label-setting. We provide a basic overview of both families in 
section [lT The prior fast Eikonal-solvers based on label-setting and label-correcting are reviewed 
in sections 2.1| and |2.2p.3| respectively. 

The most popular methods from these two categories (Fast Marching and Fast Sweeping) have 
been shown to be efficient on a wide range of Eikonal equations. However, each of these methods 
has its own preferred class of problems, on which it significantly outperforms the other. Despite 
experimental comparisons already conducted in |23j and |22) . the exact delineation of a preferred 
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In the journal version, subsections |4.4| and |4. 5| and parts of section |3] were omitted due to space limitations. 

^ For simplicity, we will restrict our exposition to first-order accurate discretizations of these problems on Cartesian 
grids in , although generalizations to higher dimensional domains are straightforward and similar approaches are 
applicable to higher-order accurate discretizations on unstructured meshes in i?" and on manifolds. 
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problem-set for each method is still a matter of debate. Fast Sweeping (reviewed in section 2.1 1 is 
usually more efficient on problems with constant characteristic directions. But for general functions 
F(x), its computational cost is clearly impacted by the frequency and magnitude of directional 
changes of characteristic curves. Fast Marching (reviewed in section 2.1) is generally more efficient 
on domains with complicated geometry and on problems with characteristic directions frequently 
changing. Its causal algorithmic structure results in a provably converged solution on explicitly 
determined parts of the computational domain even before the method terminates ~ a very useful 
feature in many applications. Moreover, its efficiency is much more "robust"; i.e., its computational 
cost is much less affected by any changes in functions F and q or the grid orientation. But as a 
result, the Fast Marching also is not any faster in the simplest cases where F is constant on a convex 
domain and all characteristics are straight lines - the exact scenario where the Fast Sweeping is at 
its most efficient. 

The fundamental idea underlying our hybrid two-scale methods is to take advantage of the best 
features of both marching and sweeping. Suppose the domain is split in a moderate number of cells 
such that F is almost-constant on each of them. (Such cell splitting is possible for any piecewise- 
smooth F.) On the top scale, a version of Fast Marching can be used on a coarse grid (with each 
gridpoint representing a cell of the fine grid). Once the ordering of coarse gridpoints is established, 
the Fast Sweeping is applied to individual cells of the fine grid in the same order. This is the basis 



of our Fast Marching-Sweeping Method (FMSM) described in section 3.1 



Unfortunately, the coarse grid ordering captures the information flow through the fine grid cells 
only approximately: a coarse gridpoint might be "accepted" by Fast Marching before another 
coarse gridpoint y^, even if on the fine grid the characteristics cross both from cell i to cell j and 
from cell j to cell i. The "one-pass" nature of Fast Marching prevents FMSM from acting on such 
interdependencies between different cells even if they are revealed during the application of Fast 
Sweeping to these cells. To remedy this, we introduce the Heap-Cell Method (HCM) described in 
section [3^ The idea is to allow multiple passes through fine grid cells sorted by the representative 
"cell- values" and updated as a result of cell-level fast sweeping. We also describe its heuristic version, 
the Fast Heap-Cell Method (FHCM), where the number of cell- level sweeps is determined based on 
the cell-boundary data. 

Similarly to Fast Marching and Fast Sweeping, our HCM provably converges to the exact solution 
of the discretized equations on the fine scale. In contrast, the even faster FHCM and FMSM usually 
introduce additional errors. But based on our extensive numerical experiments (section [4|, these 
additional errors are small compared to the errors already present due to discretization. The key 
advantage of all three new methods is their computational efficiency - with properly chosen cell 
sizes, we can significantly outperform both Fast Sweeping and Fast Marching on examples difficult 
for those methods, while matching their performance on the examples which are the easiest for each 
of them. We conclude by discussing the current limitations of our approach and several directions 
for future work in section [5l 



1.1. Fast algorithms for paths on graphs. We provide a brief review of common fast meth- 
ods for the classical shortest/cheapest path problems on graphs. Our exposition follows [8] and [9], 
but with modifications needed to emphasize the parallels with the numerical methods in sections [2] 
and|3] 

Consider a directed graph with nodes X ~ {xi, Xm}- Let N{xi) be the set of nodes to which 
Xi is connected. We will assume that n <C M is an upper bound on outdegrees; i.e., |iV(a;i)| < k. We 
also suppose that all arc-costs Cij = C{xi, Xj) are positive and use Cij = +oo whenever Xj ^ N{xi). 
Every path terminates upon reaching the specified exit set Q d X, with an additional exit-cost 
Qi — q{xi) for each Xi £ Q. Given any starting node Xi £ X, the goal is to find the cheapest path 
to the exit starting from Xi. The value function Ui = U{xi) is defined to be the optimal path-cost 
(minimized over all paths starting from Xi). If there exists no path from Xi to Q, then Ui — +oo, 
but for simplicity we will henceforth assume that Q is reachable from each Xi and all C/^'s are finite. 
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The optimality principle states that the "tail" of every optimal path is also optimal; hence, 



Ui ^ min {Cio + t/,} , for Vxi G X\Q; 

U, = q,, for yx, e Q. (1.2) 

This is a coupled system of M non-linear equations, but it possesses a nice "causal" property: if 
Xj G N{xi) is the minimizer, then Ui > Uj. 

In principle, this system could be solved by "value iterations" ; this approach is unnecessarily 
expensive (and is usually reserved for harder stochastic shortest path problems), but we describe it 
here for methodological reasons, to emphasize the parallels with "fast iterative" numerical methods 
for Eikonal PDEs. An operator T is defined on R^^ component-wise by applying the right hand side 



of equation (1.2). Clearly, U 



Ui 
Um 



is a fixed point of T and one can, in principle, recover U 



by value iterations: 

TW'' starting from any initial guess W° € . (1.3) 



Due to the causality of system (1.2), value iterations will converge to U regardless of W'^ after at 
most M iterations, resulting in O(M^) computational cost. (It is easy to show by induction that 
= Ui for every Xi from which there exists an optimal path with at most k transitions.) A 
Gauss-Seidel relaxation of this iterative process is a simple practical modification, where the entries 
of VF'^'+i are computed sequentially and the new values are used as soon as they become available: 
^k+i ^ Ti{Wl^^, . . . , W^^^ ,Wi, . . . , W^i). The number of iterations required to converge will now 
heavily depend on the ordering of the nodes (though M is still the upper bound). We note that. 



again due to causality of (1.2 1, if the ordering is such that Ui > Uj => i > j, then only one full 
iteration will be required (i.e., = U regardless of W'^). Of course, U is not known in advance 
and thus such a causal ordering is usually not available a priori (except in acyclic graphs). If several 
different node orderings are somehow known to capture likely dependency chains among the nodes, 
then a reasonable approach would be to perform Gauss-Seidel iterations alternating through that 
list of preferred orderings ~ this might potentially result in a substantial reduction in the number of 
needed iterations. In section [2.2| we explain how such preferred orderings arise from the geometric 
structure of PDE discretizations, but no such information is typically available in problems on graphs. 
As a result, instead of alternating through a list of predetermined orderings, efficient methods on 
graphs are based on finding advantageous orderings of nodes dynamically. This is the basis for 
label- correcting and label-setting methods. 

A generic label-correcting method is summarized below in algorithm [T] It is easy to prove that 
this algorithm always terminates and that upon its termination V = U] e.g., see [S]. Many different 
label-correcting methods are obtained by using different choices on how to add the nodes to the list 
L and which node to remove (in the first line inside the while loop). If L is implemented as a queue, 
the node is typically removed from the top of L. Always adding the nodes at the bottom of L yields 
the Bellman-Ford method 6J. (This results in a first-in/first-out policy for processing the queue.) 
Always adding nodes at the top of L produces the depth-first-search method, with the intention of 
minimizing the memory footprint of L. Adding nodes at the top if they have already been in L 
before, while adding the "first-timers" at the bottom yields D'Esopo-Pape method |33j. Another 
interesting version is the so called small-labels- first (SLF) method [7], where the node is added at 
the top only if its value is smaller than that of the current top node and at the bottom otherwise. 
Another variation is large-labels-last (LLL) method |10| . where the top node is removed only if its 
value is smaller than the current average of the queue; otherwise it's simply moved to the bottom 
of the queue instead. Yet another popular approach is called thresholding method, where L is split 
into two queues, nodes are removed from the first of them only and added to the first or the second 
queue depending on whether the labels are smaller than some (dynamically changing) threshold 
value [20]. We emphasize that the convergence is similarly obtained for all of these methods, their 
worst-case asymptotic complexity is O(M^), but their comparative efficiency for specific problems 
can be dramatically different. 



Algorithm 1 Generic Label-Correcting pseudocode. 



1; Initialization: 

2: for each node do 



3: i{ Xi G Q then 

4: ^ Qi 

5: else 

6: it N{x,)f]Q then 

7: Vi -s— min {i 

a;jeJV(a;i)nQ 
8: add Xi to the list L 

9: else 

10: Vi ^ OO 

11: end if 

12: end if 



13: end for 

14: 

15: Main Loop: 

16: while L is nonempty do 



17: Remove a node Xj from the list L 

18: for each Xi ^ Q such that Xj £ N{xi) and Vj < Vi do 

19: V ^ dj + Vj 

20: ii V <Vi then 

21: V,^V 

22: if £Cj ^ i then 

23: add Xi to the hst L 

24: end if 

25: end if 

26: end for 



27: end while 



Label-setting algorithms can be viewed as a subclass of the above with an additional property: 
nodes removed from L never need to be re-added later. Dijkstra's classical method [18] is the most 
popular in this category and is based on always removing the node with the smallest label of those 
currently in L. (The fact that this results in no re-entries into the list is yet another consequence of 
the causality; the inductive proof is simple; e.g., see [S].) The need to find the smallest label entails 
additional computational costs. A common implementation of L using heap-sort data structures will 
result in 0(M log M) overall asymptotic complexity of the method on sparsely connected graphs 
(i.e., provided k <C M). Another version, due to Dial [H], implements L as a list of "buckets", so 
that all nodes in the current smallest bucket can be safely removed simultaneously, resulting in the 
overall asymptotic complexity of 0{M). The width of each bucket is usually set to be (5 = min^j Cij 
to ensure that the nodes in the same bucket could not influence or update each other even if they 
were removed sequentially. 

We note that several label-correcting methods were designed to mimic the "no-re-entry" property 
of label-setting, but without using expensive data structures. (E.g., compare SLF/LLL to Dijkstra's 
and thresholding to Dial's.) Despite the lower asymptotic complexity of label-setting methods, label- 
correcting algorithms can be more efficient on many problems. Which types of graphs favor which of 
these algorithms remains largely a matter of debate. We refer readers to [51 [3] and references therein 
for additional details and asynchronous (parallelizable) versions of label-correcting algorithms. 

2. Eikonal PDE, upwind discretization &; prior fast methods. Static Hamilton-Jacobi 



equations frequently arise in exit-time optimal control problems. The Eikonal PDE (1.1 1 describes 
an important subset: isotropic time-optimal control problems. The goal is to drive a system starting 
from a point x £ Q to exit the domain as quickly as possible. In this setting, _F : — >■ 12+ is the 
local speed of motion, and q : dil — > ii is the exit-time penalty charged at the boundary. We note 
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that more general control problems (with an exit-set Q d dfl and trajectories constrained to remain 
inside until reaching Q) can be treated similarly by setting q = +00 on dV,\Q. 

The value function u{x) is defined to be the minimum time-to-exit starting from x and a formal 



argument shows that u should satisfy the equation (1.1). Moreover, characteristics of this PDE 



coinciding with the gradient lines of u, provide the optimal trajectories for moving through the 



domain. Unfortunately, the equation (1.1) usually does not have a classical (smooth) solution on 
the entire domain, while weak solutions are not unique. Additional test conditions are used to select 
among them the unique viscosity solution, which coincides with the value function of the original 
control problem |14[ 113) . A detailed treatment of general optimal control problems in the framework 
of viscosity solutions can be found in 

Many discretization approaches for the Eikonal equation have been extensively studied including 
first-order and higher-order Eulerian discretizations on grids and meshes in R" and on manifolds |36l 
l37l l28t l40] , semi-Lagrangian discretizations [HI [21] , and the related approximations with controlled 
Markov chains [IHl [12] . For the purposes of this paper, we will focus on the simplest first-order 
upwind discretization on a uniform Cartesian grid X (with gridsize /i) on f2 C R^. To simplify the 
description of algorithms, we will further assume that both dfl and Q are naturally discretized on 
the grid X. Our exposition here closely follows 

To introduce the notation, we will refer to gridpoints Xij = {xi,yj), value function approxima- 
tions Uij — U{xij) « u{xij), and the speed F^- = F{xij). A popular first-order accurate discretiza- 



tion of (1.1) is obtained by using upwind finite-differences to approximate partial derivatives: 

{max{D-/U, -Dt^U, 0)f + (max(i?,7C/, -D^^U, O))' - (2.1) 



Where z..(x.,y,) « Df^U = ^^^^^ ^'^ Uy{x,,y,) « A±^C/ = ^^^^^^^'^ . 

If the values at four surrounding nodes are known, this equation can be solved to recover Uij . This is 
best accomplished by computing updates from individual quadrants as follows. Focusing on a single 
node Xij, we will simplify the notation by using U — Uij, F — Fij, and {Ue, Un, Uw, Us} for the 
values at its four neighbor nodes. 

First, suppose that max {DI/'U, -D±'=U, O) = and max (f^'^t/, -D^j'U, O) = -D+/U. This 
implies that U > U^ and the resulting equation yields 

U = h/F + UN- (2.2) 

To compute "the update from the first quadrant" , we now suppose that 
max(.Dr-i7, -D+j'U, O) = -D+^U and max(i^-.^[/, -D+yU, O) = -D+yU. 
This implies that U > Un and JJ >Ue- The resulting quadratic equation is 

U-UeV {U-UnV 1 



h \ h F2 



(2.3) 



We define "the update from the first quadrant" U^-^ to be the root of the above quadratic satisfying 
U > max(C/jv, J7b). If no such root is available, we use the smallest of the "one-sided" updates, 
similar to the previous case; i.e., U^^ = h/F + uYa\{UN, Ue)- If we similarly define the updates from 
the remaining three quadrants, it is easy to show that U = mhi[U^^ , U^^ , U^^ , U^^) satisfies the 



original equation (2.1). 

Remark 2.1. It is also easy to verify that this discretization is 
• consistent, i.e., suppose both sides of (2.1 ) are multiplied by h?; if the true solution u{x) is smooth. 



it satisfies the resulting discretized equation up to 0{h?); 

• monotone, i.e., [/ is a non-decreasing function of each of its neighboring values; 

• causal, i.e., U depends only on the neighboring values smaller than itself |371 138j . 

The consistency and monotonicity can be used to prove the convergence to the viscosity solution 



u(x); see [5]. However, since (2.1) has to hold at every gridpoint Xij G X\Q, this discretization 
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results in a system of M coupled non-linear equations, where M is the number of gridpoints in the 
interior of f2. In principle, this system can be solved iteratively (similarly to the "value iterations" 



process described in (1.3)) with or without Gauss-Seidel relaxation, but a naive implementation of 
this iterative algorithm would be unnecessarily expensive, since it does not take advantage of the 
causal properties of the discretization. Several competing approaches for solving the discretized 
system efficiently are reviewed in the following subsections. 

2.1. Label-setting methods for the Eikonal. The causality property observed above is the 
basis of Dijkstra-like methods for the Eikonal PDE. The first such method was introduced by Tsitsiklis 
for isotropic control problems using first-order semi-Lagrangian discretizations on uniform Cartesian 
grids flSl m] . The Fast Marching Method was introduced by Sethian [3 7) using first-order upwind- 
finite differences in the context of isotropic front propagation. A detailed discussion of similarities and 
differences of these approaches can be found in [42]. Sethian and collaborators have later extended 
the Fast Marching approach to higher-order discretizations on grids and meshes |39| . more general 
anisotropic Hamilton- Jacobi-Bellman PDEs [4lJ [32] ) and quasi- variational inequalities [33] . Similar 
methods were also introduced for semi-Lagrangian discretizations [TS]. The Fast Marching Method 
for the Eulerian discretization (2.1 1 is summarized below in Algorithmic] 



Algorithm 2 Fast Marching Method pseudocode. 



1; Initialization: 

2; for each gridpoint Xij E X do 
3: if Xij G Q then 

4: Label Accepted and set Vij = qixij). 

5: else 

6: Label Xij as Far and set Vij ~ oo. 

7; end if 

8; end for 

9: for each Far neighbor Xij of each Accepted node do 
10: Label Considered and put Xij onto the Considered List L. 

11: Compute a temporary value Vij using the upwinding discretization. 
12: if Vi-j < Vij then 

13: V^j ^ Vj 

14: end if 

15: end for 

16: End Initialization 

17: 

18: while L is nonempty do 

19: Remove the point x with the smallest value from L. 
20: for Xij e N{x) do 

21: Compute a temporary value Vij using the upwinding discretization. 

22: if Vij < Vij then 

23: Vij ^ Vj 

24: end if 

25: if Xij is Far then 

26: Label Considered and add it to L. 

27: end if 

28: end for 

29: end while 



As explained in section [O] the label-setting Dijkstra's method can be considered as a special 
case of the generic label-correcting algorithm, provided the current smallest node in L is always 
selected for removal. Of course, in this case it is more efficient to implement L as a binary heap 
rather than a queue. The same is also true for the Fast Marching Method, and a detailed description 
of an efficient implementation of the heap-sort data structure can be found in |38) . The re-sorting 

6 



of Considered nodes upon each update involves up to O(logM) operations, resulting in the overall 
computational complexity of 0(M log M). 

Unfortunately, the discretization (2.1) is only weakly causal: there exists no (5 > such that 
U^^ > S + m&x{U^,U^) whenever iF^ > max(C/^, f/^). Thus, no safe "bucket width" can be 
defined and Dial-like methods are not applicable to the resulting discretized system. In [IS] Tsitsiklis 
introduced a Dial-like method for a similar discretization but using an 8-neighbor stencil. More 
recently, another Dial-related method for the Eikonal PDE on a uniform grid was introduced in [27] . 
A more general formula for the safe bucket-width to be used in Dial-like methods on unstructured 
acute meshes was derived in |47| . Despite their better computational complexity, Dial- like methods 
often perform slower than Dijkstra-like methods at least on single processor architectures. 

Finally, we note another convenient feature of label-setting methods: if the execution of the 
algorithm is stopped early (before the list L becomes empty), all gridpoints previously removed from 
L will already have provably correct values. This property (unfortunately not shared by the methods 
in sections 2.2||2.3 ) is very useful in a number of applications: e.g., when computing a quickest path 
from a single source to a single target or in problems of image segmentation |38) . 



2.2. Fast Sweeping Methods. Suppose all gridpoints in X are ordered. We will slightly 
abuse the notation by using a single subscript (e.g., Xi) to indicate the particular gridpoint's place in 
that ordering. The double subscript notation (e.g., Xij) will be still reserved to indicate the physical 
location of a gridpoint in the two-dimensional grid. 

Consider discretization (2.1 ) and suppose that the solution U is known for all the gridpoints. Note 
that for each Xi , the value Ui will only depend on one or two of the neighboring values (depending on 
which quadrant is used for a two-sided update, similar to (2.3), and on whether a one-sided update 
is employed, similar to (2.2)). This allows us to define a dependency digraph G on the vertices 
Xi, . . . ,xm with a link from Xi to Xj indicating that Uj is needed to compute Ui. The causality 
of the discretization (2.1) guarantees that G will always be acyclic. Thus, if we were to order the 
gridpoints respecting this causality (i.e., with i > j =4> there is no path in G from Xj to Xi), then 
a single Gauss-Seidel iteration would correctly solve the full system in 0{M) operations. 

However, unless U was already computed, the dependency digraph G will not be generally 
known in advance. Thus, basing a gridpoint ordering on it is not a practical option. Instead, one 
can alternate through a list of several "likely" orderings while performing Gauss-Seidcl iterations. A 
geometric interpretation of the optimal control problem provides a natural list of likely orderings: 
if all characteristics point from SW to NE, then ordering the gridpoints bottom-to-top and left-to- 
right within each row will ensure the convergence in a single iteration (a "SW sweep"). The "Fast 
Sweeping Methods" perform Gauss-Seidel iterations on the system (2.1 1 in alternating directions 
(sweeps). Let m be the number of gridpoints in the x-direction and n be the number in the y- 
direction, and Xij will denote a gridpoint in a uniform Cartesian grid on 17 C i?^. For simplicity, 
we will use the Matlab index notation to describe the ordering of gridpoints in each sweep. There 
are four alternating sweeping directions: from SW, from SE, from NE, and from NW. For the above 
described southwest sweep, the gridpoints Xij will be processed in the following order: i=l:l:m, 
j=l : 1 :n. All four orderings are similarly defined in algorithm [3] 

The alternating sweeps are then repeated until convergence. The resulting algorithm is summa- 
rized inlH 

Remark 2.2. The idea that alternating the order of Gauss-Seidel sweeps might speed up 
the convergence is a centerpiece of many fast algorithms. For Euclidean distance computations it 
was first used by Danielsson in [16]. In the context of general HJB PDEs it was introduced by 
Boue and Dupuis in [I2j for a numerical approximation based on controlled Markov chains. More 
recently, a number of papers by Cheng, Kao, Osher, Qian, Tsai, and Zhao introduced related Fast 
Sweeping Methods to speed up the iterative solving of finite-difference discretizations [HI [STJ [SS] . 
The key challenge for these methods is to find a provable and explicit upper bound on the number 
of iterations. As of right now, such a bound is only available for boundary value problems in which 
characteristics are straight lines. Experimental evidence suggests that these methods can be also very 
efficient for other problems where the characteristics are "largely" straight. The number of necessary 
iterations is largely independent of M and correlated with the number of times the characteristics 
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Algorithm 3 Sweeping Order Selection pseudocode. 
sweep Direction sweepNumber mod 4 



1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
10: 
11: 
12: 
13: 
14: 



if sweepDirection == then 
iOrder ^ (1:1: m) 
jOrder -S— (1 : 1 : n) 

else if sweepDirection 1 then 
iOrder ^ (1 : 1 : m) 
jOrder -f— {n : —1 : 1) 

else if sweepDirection == 2 then 
iOrder (m : —1 : 1) 
jOrder ^ {n : —1 : 1) 

else 

iOrder ^ (m : —1 : 1) 
jOrder ^ (1 : 1 : n) 
end if 



Algorithm 4 Fast Sweeping Method pseudocode. 



1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 



Initialization: 

for each gridpoint Xij E X do 
if Xij e Q then 

Vij ^ q{x.,j). 

else 

Vjj ^ oo. 
end if 
end for 

Main Loop: 
sweepNumber 
repeat 

changed -s- FALSE 

Determine iOrder and jOrder based on sweepNumber 
for i = iOrder do 

for j = jOrder do 
if Xij ^ Q then 



Compute a temporary value Vij using upwinding discretization (2.1). 



if Vij < Vij then 

changed ^ TRUE 
end if 
end if 
end for 
end for 

sweepNumber sweepNumber + 1 
until changed == FALSE 



"switch directions" (i.e., change from one directional quadrant to another) inside fJ. However, since 
the quadrants are defined relative to the grid orientation, the number of iterations will generally be 
grid-dependent . 

One frequently encountered argument is that, due to its 0(A/) computational complexity, the 
Fast Sweeping is more efficient than the Fast Marching, whose complexity is 0(A/logM). However, 
this asymptotic complexity notation hides constant factors - including this not-easily-quantifiable 
(and grid-orientation-dependant) bound on the number of iterations needed in Fast Sweeping. As 
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a result, whether the 0(M) asymptotic complexity actually leads to any performance advantage on 
grids of realistic size is a highly problem-dependent question. Extensive experimental comparisons 
of Marching and Sweeping approaches can be found in [231 122j . Even though such a comparison is 
not the main focus of the current paper, the performance of both methods is also tabulated for all 
examples in section |4j On the grids we tested, we observe that the Fast Marching Method usually 
performs better than the Fast Sweeping when the domain has a complicated geometry (e.g., short- 
est/quickest path computations in a maze) or if the characteristic directions change often throughout 
the domain - the latter situation frequently arises in Eikonal problems when the speed function F 
is highly inhomogeneous. 

We note that the sweeping approach can be in principle useful for a much wider class of problems. 
For example, the method introduced in |25| is applicable to problems with non-convex Hamiltonians 
corresponding to differential games; however, the amount of required artificial viscosity is strongly 
problem-dependent and the choice of consistently discretized boundary conditions can be compli- 
cated. Sweeping algorithms for discontinuous Galerkin finite element discretizations of the Eikonal 
PDF can be found in [301 110] • 

The Fast Sweeping Method performs particularly well on problems where the speed function F 
is constant, since in this case the characteristics of the Eikonal PDF will be straight lines regardless 
of the boundary conditions. (E.g., if g = 0, then the quickest path is a straight line to the nearest 
boundary point.) As a result, the domain consists of 4 subdomains, each with its own characteristic 
"quadrant direction". Even though these subdomains are generally not known in advance, it is 
natural to expect Fast Sweeping to converge in at most 4 iterations (e.g., if aj^j's characteristic comes 
from the SE, then the same is true for all points immediately to SE from Xij). However, on the grid, 
the dependency graph can be more complicated - Xij will depend on both its southern and eastern 
neighbors. The characteristic directions are changing continuously everywhere, except at the shocks. 
So, if Xij is near a shock line, one of its neighbors might be in another subdomain, making additional 
sweeps occasionally necessary even for such simple problems; see Figure [2T] for an illustration. 



Xi 



Fig. 2.1. Four subdomains with a different update quadrant in each of them. If the sweeping directions are 
used in the order (SE, SW, NW, N E), then the node labeled Xij near the shock line will not receive its final update 
until the 5th sweep, since its southern neighbor lies in the southwest subdomain. For simplicity, this example uses 
boundary conditions such that the characteristic directions are constant in each subdomain. As a result, all subdomain 
boundaries coincide with shock lines, which need not be the case in general, but the illustrated effect is generic. 

Nevertheless, when F is constant, the Fast Sweeping is usually more efhcient than the Fast 
Marching regardless of the boundary conditions - an observation which is the basis for the hybrid 
methods introduced in the next section. 



Remark 2.3. It might seem that the recomputation of Vij from (2.1) will generally require 



solving 4 quadratic equations to compare the updates from all 4 quadrants. However, the mono- 



tonicity property noted in Remark 2.1 guarantees that only one quadrant needs to be considered. 



E.g., if Us < Un then U^^ < U^^ and the latter is irrelevant even if we are currently sweeping 
from NE. Thus, the relevant quadrant can be always found by using min([/5, Un) and imii(UE, Uw)- 
We note that this shortcut is not directly applicable to discretizations on unstructured meshes nor 
for more general PDEs. Interestingly, Alton and Mitchell showed that the same shortcut can also be 
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used with Cartesian grid discretizations of Hamilton- Jacobi PDEs with grid-ahgned anisotropy [2]- 

Remark 2.4. One of the problems in this basic version of the Fast Sweeping Method is the fact 
that the CPU time might be wasted to recompute Vij even if none of Xij 's neighbors have changed 
since the last sweep. To address this, one natural modification is to introduce "locking flags" for 
individual gridpoints and to update the currently unlocked gridpoints only [3] . Briefly, all gridpoints 
but those immediately adjacent to Q start out as locked. When an unlocked gridpoint Xij is processed 
during a sweep, if Uij changes, then all of its larger neighbors are unlocked. The gridpoint Xij is 
then itself locked regardless of whether updating Uij resulted in unlocking a neighbor. 

The above modification does not change the asymptotic complexity of the method nor the total 
number of sweeps needed for convergence. Nevertheless, the extra time and memory required to 
maintain and update the locking flags are typically worthwhile since their use allows to decrease the 
amount of CPU-time wasted on parts of the domain, where the iterative process already produced 
the correct numerical solutions. In sections [3] and |4] we will refer to this modified version as Locking 
Sweeping Method (LSM) to distinguish it from the standard implementation of the FSM. 

2.3. Other fast methods for Eikonal equations. Ideas behind many label-correcting algo- 
rithms on graphs have also been applied to discretizations of Eikonal PDEs. Here we aim to briefly 
highlight some of these connections. 

Perhaps the first label-correcting methods developed for the Eikonal PDE were introduced by 
Polymenakos, Bertsekas, and Tsitsiklis based on the logic of the discrete SLF/LLL algorithms [51] . 
On the other hand, Bellman-Ford is probably the simplest label-correcting approach and it has 
been recently re-invented by several numerical analysts working with Eikonal and more general 
Hamilton- Jacobi-Bellman PDEs jTT], [3], including implementations for massively parallel computer 
architectures [23]. A recent paper by Bak, McLaughlin, and Renzi [3J also introduces another "2- 
queues method" essentially mimicking the logic of thresholding label-correcting algorithms on graphs. 
While such algorithms clearly have promise and some numerical comparisons of them with sweeping 
and marching techniques are already presented in the above references, more careful analysis and 
testing is required to determine the types of examples on which they are the most efficient. 

All of the above methods produce the exact same numerical solutions as EMM and FSM. In 
contrast, two of the three new methods introduced in section [3] aim to gain efficiency even if it 
results in small additional errors. We know of only one prior numerical method for Eikonal PDEs 
with a similar trade-off: in [49] a Dial-like method is used with buckets of unjustified width S for a 
discretization that is not 5-causal. This introduces additional errors (analyzed in [35]), but decreases 
the method's running time. However, the fundamental idea behind our new two-scale methods is 
quite different, since we aim to exploit the geometric structure of the speed function. 

3. New hybrid (two-scale) fast methods. We present three new hybrid methods based 
on splitting the domain into a collection of non-overlapping rectangular "cells" and running the 
Fast Sweeping Method on individual cells sequentially. The motivation for this decomposition is to 
break the problem into sub-problems, with F nearly constant inside each cell. If the characteristics 
rarely change their quadrant-directions within a single cell, then a small number of sweeps should 
be sufficient on that cell. But to compute the value function correctly within each cell, the correct 
boundary conditions (coming from the adjacent cells) should be already available. In other words, we 
need to establish a causality-respecting order for processing the cells. The Fast Marching Sweeping 
Method (FMSM) uses the cell-ordering found by running the Fast Marching Method on a coarser 
grid, while the Heap-Cell Methods (HCM and FHCM) determine the cell-ordering dynamically, based 
on the value-updates on cell-boundaries. 

3.1. Fast Marching- Sweeping Method (FMSM). This algorithm uses a coarse grid and 
a fine grid. Each "coarse gridpoint" is taken to be the center of a cell of "fine gridpoints". (For 
simplicity, we will assume that the exit-set Q is directly representable by coarse gridpoints.) The Fast 
Marching is used on the coarse grid, and the Acceptance-order of "coarse gridpoints" is recorded. 
The Fast Sweeping is then used on the corresponding cells in the same order. An additional speed-up 
is obtained, by running a fixed number of sweeps on each cell, based on the upwind directions de- 
termined on the coarse grid. Before providing the details of our implementation, we introduce some 
relevant notation: 
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• X'^ ~ {xl, ...,Xj}, the coarse grid. 

• X-f = a;{,/}, the fine grid (same as the grid used in FMM or FSM). 

• Q'^ C X'^, the set of coarse gridpoints discretizing the exit set Q. 

• If^, the solution of the discretized equations on the coarse grid. 

• V, the temporary label of the coarse gridpoints. 

• Z ~ {ci, ...,cj}, the set of cells, whose centers correspond to coarse gridpoints. 

• N'^{ci), the neighbors of cell q; i.e., the cells that exist to the north, south, east, and west of c^. 
(The set N'^{ci) may contain less than four elements if Ci is a boundary cell.) 

• [ci), the fine grid neighbors of i.e., {ci) — {xj G Xf I and N{xj)f]c, ^ 0}. 

• P : {1, J} — >■ {1, J}, a permutation on the coarse gridpoint indices. 

• h^, the distance along the x-direction between two neighboring coarse gridpoint. 
Assume for simplicity that h% — hy — h'^. 

All the obvious analogs hold for the fine grid {U^ , h-f , etc). Since Fast Marching will be used on 
the coarse grid only, the heap L will contain coarse gridpoints only. 



Algorithm 5 Fast Marching-Sweeping Method pseudocode. 



9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 



Part I: 

Run FMM on X'^ (see algorithm [2| . 

Build the ordering P to reflect the Acceptance-order on X'^. 
Part II: 

Fine grid initialization: 
for each gridpoint x{ E X^ do 
if x{ e then 

V/ ^ q(- 

else 

^ oo; 

end if 
end for 

for j = P(l) : P{J) do 

Define the fine-grid domain c — Cj[_} {cj ) . 
Define the boundary condition as 

q{x{) = q{x{) on f] Qf and 
q{x{) = V/ on Nf{c,). 
Perform Modified Fast Sweeping (see Remark 3.1) on c using boundary conditions q. 
end for 



Remark 3.1. The "Modified Fast Sweeping" procedure applied to individual cells in the 
algorithm [5] follows the same idea as the FSM described in section 2.2 For all the cells containing 
parts of Q (i.e., the ones whose centers are Accepted in the initialization of the FMM on the coarse 
grid) we use the FSM without any changes. For all the remaining cells, our implementation has 3 
important distinctions from the algorithm [4] 

1. No initialization of the fine gridpoints within c is needed since the entire fine grid is pre- 
initialized in advance. 

2. Instead of looping through different sweeps until convergence, we use at most four sweeps and 



only in the directions found to be "upwind" on the coarse grid. As illustrated by Figure 3.1 



the cells in N'^(ci) whose centers were accepted prior to x1 determine the sweep directions 
to be used on 

3. When computing during the sweeping, we do not employ the procedure described in 
Remark 2.3 to find the relevant quadrant. Instead, we use "sweep-directional updates"; e.g., 
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if the current sweeping direction is from the NE, we always use the update based on the 
northern and eastern neighboring fine gridpoints. The advantage is that we have already 
processed both of them within the same sweep. 



Fig. 3.1. Sweeping directions on Ci chosen based on the neighboring cells accepted earlier than Ci (shown in 
green). Note that 2 sweeping directions are conservatively used in the case of a single accepted neighbor. 

Before discussing the computational cost and accuracy consequences of these implementation 
choices, we illustrate the algorithm on a specific example: a checkerboard domain with the speed 
function = 1 in white and = 2 in black checkers, and the exit set is a single point in the center of 
the domain see Figure 3.2 1. This example was considered in detail in |32j : the numerical results and 
the performance of our new methods on the related test problems are described in detail in section 
|4.2[ As explained in Remark [3?T] 2, we do not sweep until convergence on each cell; e.g., the sweeps 
for the cell # 1 in Figure [X2] will be from northwest and southwest, while the cell #14 will be swept 
from northeast only. 
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Fig. 3.2. Left: The 5x5 checkerboard domain with a source point in the slow checker in the center. Right: The 
order of cell- acceptance in Part I of FMSM, assuming that the size of cells and checker is the same. 



The resulting algorithm clearly introduces additional numerical errors - in all but the simplest 



examples, the FMSM's output is not the exact solution of the discretized system (2.1) on . We 



identify three sources of additional errors: the fact that the coarse grid computation does not capture 



all cell interdependencies, and the two cell-sweeping modifications described in Remark 3.1 Of these, 
the first one is by far the most important. Focusing on the fine grid, we will say that the cell Ci 
depends on Cj € N'^(ci) if there exists a gridpoint xj^ G Ci such that Ul directly depends on 
for some gridpoint e Cj. In the limit, as — 0, this means that Ci depends on Cj if there is a 
characteristic going from Cj into Ci (i.e., at least a part of q's boundary shared with Cj is inflow). For 
a specific speed function F and a fixed cell-decomposition Z, a causal ordering of the cells need not 



exist at all. As shown in Figure 3.3 two cells may easily depend on each other. This situation arises 
even for problems where F is constant on each cell; see Figure [4?2| Moreover, if the cell refinement is 
performed uniformly, such non-causal interdependencies will be present even as the cell size h'^ — ^ 0. 
This means that every algorithm processing each cell only once (or even a fixed number of times) 
will unavoidably introduce additional errors at least for some speed functions F . 

One possible way around this problem is to use the characteristic's vacillations between Ci to Cj 
to determine the total number of times that these cells should be alternately processed with FSM. 
This idea is the basis for heap-cell methods described in the next section. However, for the FMSM we 
simply treat these "approximate cell-causality" errors as a price to pay for the higher computational 
efficiency. Our numerical experiments with FMSM showed that, as h'^ — >■ 0, the effects due to the 
approximate cell-causality dominate the errors stemming from using a finite (coarse-grid determined) 
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Fig. 3.3. Two mutually dependent cells. 



number of sweeps. I.e., when the ceUs are sufficiently small, running FSM to convergence does not 
decrease the additional errors significantly, but does noticeably increase the computational cost. The 
computational savings due to our use of "sweep-directional updates" are more modest (we simply 
avoid the necessity to examine/compare all neighbors of the updated node), but the numerical 
evidence indicates that it introduces only small additional errors and only near the shock lines, 
where Vm is undefined. Since characteristics do not emanate from shocks, the accuracy price of 
this modification is even more limited if the errors are measured in Li norm. In section [4] we show 
that on most of the cumulative additional errors in FMSM are typically much smaller than the 
discretization errors, provided h'^ is sufficiently small. 

The monotonicity property of the discretization ensures that the computed solution V-^ will 
always satisfy V/ > Ui. The numerical evidence suggests that V-^ becomes closer to as h'^ 
decreases, though this process is not always monotone. 

The computational cost of Part I is relatively small as long as J ^ M. However, if and AI 
are held constant while h"^ decreases, this results in J — M, and the total computational cost of 
FMSM eventually increases. As of right now, we do not have any method for predicting the optimal 
/i^ for each specific example. Such a criterion would be obviously useful for realistic applications of 
our hybrid methods, and we hope to address it in the future. 

3.2. Label-correcting methods on cells. The methods presented in this section also rely on 
the cell-decomposition Z = {ci, . . . ,cj}, but do not use any coarse-level grid. Thus, X = X^ and 
we will omit the superscripts / and c with the exception of N'^{ci) and [a). We will also use h'^ 
to denote the distance between the centers of two adjacent square cells. In what follows, we will also 
define "cell values" to represent coarse-level information about cell dependencies. Unlike in finite 
volume literature, here a "cell value" is not necessarily synonymous with the average of a function 
over a cell. 



3.2.1. A generic cell-level convergent method. To highlight the fundamental idea, we start 
with a simple "generic" version of a label-correcting method on cells. We maintain a list of cells to 
be updated, starting with the cells containing the exit set Q. While the list is non-empty, we choose 
a cell to remove from it, "process" that cell (by any convergent Eikonal-solver) , and use the new grid 
values near the cell boundary to determine which neighboring cells should be added to the list. The 
criterion for adding cells to the list is illustrated in Figure 3.4 All other implementation details are 
summarized in Algorithm [6] 

It is easy to prove by induction that this method terminates in a finite number of steps; in 
Theorem |3.2| we show that upon its termination V = U on the entire grid X, regardless of the 
specific Eikonal-solver employed to process individual cells (e.g., FMM, FSM, LSM or any other 
method producing the exact solution to (2.1 1 will do). We emphasize that the fact of convergence 
also does not depend on the specific selection criteria for the next cell to be removed from L. However, 
even for a fixed cell-decomposition Z, the above choices will significantly influence the total number of 
list removals and the overall computational cost of the algorithm. One simple strategy is to implement 
L as a queue, adding cells at the bottom and always removing from the top, thus mirroring the logic 
of Bellman- Ford algorithm. In practice, we found the version described in the next subsection to be 
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cell c/ «« cell Cfc 



Fig. 3.4. Suppose that, as a result of processing the cell ci an eastern border value Vi becomes updated. If 
Vi < Vj and Xj ^ Q, the cell will be added to L (unless it is already on the list). 



Algorithm 6 Generic Label-Correcting on Cells pseudocode. 
Cell Initialization: 
for each cell Ck do 

if Cfc n Q 7^ then 

add Ck to the list L 
end if 
end for 

Fine Grid Initialization: 
for each gridpoint Xi do 
ii Xi E Q then 

Vi ^ q{xi) 

else 

Vi i~ oo 
end if 
end for 

Main Loop: 

while L is nonempty do 

Remove a cell c from the list L. 
Define a domain c — cU (c) . 
Define the boundary condition as 

q{xi) — q{xi) on cn Q and 
q{x,) = V, on Nf{c). 
Process c by solving the Eikonal on c using boundary conditions q. 
for each cell c^, e N'^{c)\L do 

if 3xi E [cf] Nf [ck)) AND Xj E {ckr\ N{xi)\Q) such that 

( Vi has changed OR {xi S Q AND c is removed from L for the first time) ) 
AND {Vi < Vj) then 

Add Cfe to the list L. 

end if 
end for 
end while 



more efficient. 

Theorem 3.2. The generic cell-based label- correcting method converges to the exact solution of 
system (2.1). 
Proof. 

First we describe notation and recall from section [2?2] the dependency digraph G. 



• We say Xj depends on Xi if Ui is used to compute Uj (see discussion of formulas (2.2 1 and 



• Tx ~ {nodes in G on which x depends directly}. For each node x, the set Fx will have 0, 1, 
or 2 elements, li x E Q, then is empty. If a one-sided update was used to compute U{x) (see 
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Fig. 3.5. A schematic view of dependency digraph Gx- 



formula (2.2 1), then there is only one element in r^;. 

• Gx denotes the subgraph of G that is reachable from the node x. 

• We define the cell transition distance d{x) — maxXieVxi'^i^i)^ cell_dist(a;, 0;^)}, 

where cell_dist(a;, Xi) = if both x and Xi are in the same cell and 1 otherwise. Note that in general 
d{x) < M, but in practice maxd{x) is typically much smaller. In the continuous limit d{x) is related 
to the number of times a characteristic that reaches x crosses cell boundaries. 

• Ds = {x E G \ d{x) = s}. See Figure 



3.5 



for an illustration of Gx split into Dq, Di 



,D 



d(X)- 

• Ds — {xj E Ds I 3xi e Dg-i such that Xj depends on Xi }, i.e., the set of gridpoints in Ds 
that depend on a gridpoint in a neighboring cell. Note that Dq = 0. 

• Ds = {xi e Ds I ^Xj e Ds+i such that Xj depends on Xi }, i.e., the set of gridpoints in Ds 
that influence a gridpoint in a neighboring cell. 

• 7k- denotes any method that exactly solves the Eikonal on c (see line 20 of algorithm |6| . 
Recall that by the monotonicity property of the discretization (2.1 ), the temporary labels Vj will 

always be greater than or equal to Uj throughout algorithm [6j Moreover, once Vj becomes equal to 
Uj , this temporary label will not change in any subsequent applications of ★ to the cell c containing 
Xj . The goal is to show that Vj ~ Uj for all Xj £ X upon the termination of Algorithm [6[ 

To prove convergence we will use induction on s. First, consider s ~ and note that every cell 
c containing some part of Dq is put in L at the time of the cell initialization step of the algorithm. 
When c is removed from L and ★ is applied to it, every x G Dq D c will obtain its final value 
V{x) = U{x) because Gx contains no gridpoints in other cells by the definition of Dq. 

Now suppose all x G D/^ already have V{x) — U{x) for all k < s. We claim that: 

1) If a cell c contains any x E Ds+i such that V{x) > U{x), then this cell is guaranteed to be in L 
at the point in the algorithm when the last Xi E Ds N-f{c) receives its final update. 

2) The next time ★ is applied to c, V{x) will become equal to U{x) for all x E Ds+i D c. 

To prove 1), suppose -D^+i H c 7^ and note that there exist Xj E Ds+i n c and Xi E Txj with 

Xi E Ds n c for some neighboring cell c. Indeed, if each gridpoint x E Ds+i H c were to depend 
only on those in Ds+i (gridpoints within the same cell) and/or those in D}. for k < s, this would 
contradict x E Dg+i (it is not possible for Tx C Uk<sDk', see Figure 3.5 1. At the time the last such 
Xi receives its final update, we will have Vj > Uj > Ui = Vi since Xi E Txy Thus, c is added to L 
(if not already there) as a result of the add criterion in AlgorithmjG] 

To prove 2), we simply note that all nodes in {Gx\c) C (Ufe=o-^fc) ^^^^ already have correct 
values at this point. 
□ 

Remark 3.3. We note that the same ideas are certainly applicable to finding shortest paths on 
graphs. The Algorithm [T] can be similarly modified using a collection of non-overlapping subgraphs 
instead of cells, but so far we were unable to find any description of this approach in the literature. 

3.2.2. Heap-Cell Method (HCM). To ensure the efficiency of cell-level label-correcting al- 
gorithms, it is important to have the "influential" cells (on which most others depend) processed as 
early as possible. Once the algorithm produces correct solution V = U on those cells, they will never 
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enter the list again, and their neighboring cells will have correct boundary conditions at least on a 
part of their boundary. The same logic can be applied repeatedly by always selecting for removal the 
most "influential" cells currently on the list. We introduce the concept of "cell values" Vj^ — V'^{ck) 
to estimate the likelihood of that cell influencing others (the smaller is Vj^, the more likely is Cfc to 
influence subsequent computations in other cells, and the higher is its priority of removal from the 



list). In Fast Marching-Sweeping Method of section 3.1 the cell values were essentially defined by 
running FMM on the coarse grid. That approach is not very suitable here, since each cell might 
enter the list more than once and it is important to re-evaluate each time this happens. Instead, 
we define and update using the boundary values in the adjacent cells, and modify Algorithm [6] 
to use the cell values as follows: 

1. Amend the cell initialization to set 

Vj^ max q{xi) or ^ oo if Cfc n Q = 0. 

2. Always remove and process the cell with the smallest value currently on the list. Efficient 
implementation requires maintaining L as a heap-sort data structure - hence the name of 
"Heap-Ceh Method" (HCM) for the resulting algorithm. 

3. After solving the Eikonal on c, update the cell values for all Ck & N'^{c) (including those 
already in L). Let bfe be a unit vector pointing from the center of c in the direction of Cfc's 
center and suppose that Xi has the largest current value among the gridpoints inside c but 

'/j. Define y.^ = Xi + 'i^' 



adjacent to Ck', i.e., Xi = argmax Vj. Define = Xi + ^^'t,^ bk- Then 



Xje{cnNf(ck)) 



^ min (V^, V, 
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Fig. 3.6. An illustration corresponding to equation \3.1\(the estimate for a cell value) with = (1,0) 



Remark 3.4. We note that, in the original Dijkstra's and Bellman-Ford methods on graphs, 
a neighboring node's temporary label is updated before that node is added to L. In the Heap-Cell 
Method, the cell value is also updated before adding that cell to the list, but the grid values within 
that cell are updated after it is removed from L. 

Regardless of the method used to compute cell values, they can only provide an estimate of the 
dependency structure. As explained in section |3.1[ a causal cell-ordering need not exist for a fixed 
Z and a general speed functions F. Thus, < Vf does not exclude the possibility of Ck depending 
on Ci, and we do not use cell values to decide which neighboring cells to add to the list - this is still 
done based on the cell boundary data; see Algorithm |6] As a result, the fact of convergence of such 
cell-level methods does not depend on the particular heuristic used to define cell values. There are 
certainly many reasonable alternatives to formula (e.g., a more aggressive/optimistic version can 
instead select Xi — argmin Vj on the boundary; an average value of F on Ck could also be used here; 
or the distance to travel could be measured from Xi to the center of Ck, etc). Empirically, formula 



(3.1 ) results in smaller computational cost than the mentioned alternatives and it was therefore used 
in our implementation. 

Remark 3.5. The cell- values are useful even if L is implemented as a queue and the cells are 
always removed from the top. Indeed, Vj^ can still be used to decide whether Ck should be added at 
the top or at the bottom of L. This is the SLF/LLL strategy previously used to solve the Eikonal 
PDF on the grid-level (i.e., without any cells) by Polymenakos, Bertsekas, and Tsitsiklis [51]. We 
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have also implemented this strategy and found it to be fairly good, but on average less efficient than 
the HCM described above. (The performance comparison is omitted to save space.) The intuitive 
reason is that the SLF/LLL is based on mimicking the logic of Dijkstra's method, but without the 
expensive heap-sort data structures. However, when J M, the cost of maintaining the heap is 
much smaller than the cost of occasionally removing/processing less influential cells from L. 

To complete our description of HCM, we need to specify how the Eikonal PDE is solved on 
individual cells. Since the key idea behind our hybrid methods is to take advantage of the good 
performance of sweeping methods when the speed is more or less constant, we follow the same idea 
as the FSM described in section |2.2[ but with the following important distinctions from the basic 
version of algorithm [4j 

1. No initialization of gridpoint values Vi is needed within c - indeed, the initialization is carried 
out on the full grid at the very beginning and if c is removed from L more than once, the 
availability of previously computed V^^'s might only speed up the convergence on c. Here 
we take advantage of the comparison principle for the Eikonal PDE: the viscosity solution 
cannot increase anywhere inside the cell in response to decreasing the cell-boundary values. 

2. We use the Locking Sweeping version described in Remark |2.4| 

3. The standard FSM and LSM loop through the four sweep directions always in the same 
order. In our implementation of HCM, we choose a different order for the first four sweeps 
to ensure that the "preferred sweep directions" (determined for each cell individually) are 
used before all others. For all other sweeps after the first four, we revert to the standard 
loop defined in Algorithm [3j Of course, as in the standard FSM, the sweeps only continue as 
long as grid values keep changing somewhere inside the cell. The procedure for determining 



preferred sweep directions is explained in Remark 3.6 



Remark 3.6. Recall that in FMSM, the coarse grid information was used to determine the 
sweep directions to use on each cell; see Remark |3.1| and Figure |3.1| Similarly, in HCM we use 
the neighboring cells of that were found to have newly changed Cfe-inflow boundary since the 
last time Ck was added to L. We maintain four "directional flags" - boolean variables initialized to 
FALSE and representing all possible preferred sweeping directions - for each cell Ck currently in L. 
When a neighboring cell ci is processed/updated and is found to influence Cfc, this causes two of c^'s 
directional flags to be set to TRU E. To illustrate, supposing that ci is a currently-processed-western- 
neighbor of (as in Figure 3.4). If the value of a;^ € c; n N^{ck) has just changed and Vi < Vj, 
then both relevant preferred direction flags in Ck (i.e., both NW and SW) will be raised. Once is 
removed from L and processed, its directional flags are reset to FALSE. 

As explained in section [3.2.3[ a better procedure for setting these directional flags could be built 
based on fine-grid information on the cell-boundary. However, we emphasize that the procedure for 
determining preferred directions will not influence the ultimate output of HCM (since we will sweep 
on Ck until convergence every time we remove it from L) , though such preferred directions are usually 
useful in reducing the number of sweeps needed for convergence. 

The performance and accuracy data in section |4] shows that, for sufficiently small h and h'^, HCM 
often outperforms both FMM and FSM on a variety of examples, including those with piecewise 
continuous speed function F. This is largely due to the fact that the average number of times a cell 
enters the heap tends to 1 as h'^ 0. 



3.2.3. Fast Heap-Cell Method (FHCM). We also implement an accelerated version of HCM 
by using the following modifications: 

1. Each newly removed cell is processed using at most four iterations - i.e., it is only swept 
once in each of the preferred directions instead of continuing to iterate until convergence. 

2. Directional flags in all cells containing parts of Q are initialized to TRUE. 

3. To further speed up the process, we use a "Monotonicity Check" on cell-boundary data to 
further restrict the pref erred sweeping directions. For concreteness, assume that c/ and Ck 



are related as in Figure 3.4 If the grid values in {ck) H c/ are monotone non-decreasing 
from north to south, we set c^'s NW preferred direction flag to TRUE; if those grid values 
are monotone non-increasing we flag SW; otherwise we flag both NW and SW. (In contrast, 
both HCM and FMSM are always using two sweeps in this situation; see Figure |3.1| and 
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Remark 3.6 ) We note that the set c n (ck) already had to be examined to compute an 
update to and the above Monotonicity Check can be performed simultaneously. 
The resulting Fast Heap-Cell Method (FHCM) is significantly faster than HCM, but at the cost 
of introducing additional errors (see section |4| . 

The Monotonicity Checks result in a considerable increase in performance since, for small enough 
h", most cell boundaries become monotone. However, generalizing this procedure to higher dimen- 
sional cells is less straightforward. For this reason we decided against using Monotonicity Checks in 
our implementation of HCM. FHCM is summarized in Algorithm [7] 



Algorithm 7 Fast Heap-Cell Method pseudocode. 
1: Cell Initialization: 
2; if cell Ck 3 X ioi X £ Qf then 
3: Add Cfc to the list L\ 

4: Tag all four sweeping directions of as true; 
5: Assign a cell value V'^'=^" :— 0; 
6: else 

7; Assign a cell value Vj:'^''^ :— oo; 
8: end if 

9: Fine Grid Initialization: 
10; if x{ e Q-l^ then 
11: V/ := q{; 
12; else 

13; V/ := oo; 
14; end if 

15; 

16; while L is nonempty do 

17; Remove cell at the top of L; 

18; Perform Non-Directional Fast Sweeping within the cell according to its directions marked 

true, then set all directions to false and: 
19; for Each cell border N,S,E,W do 

20; if the value of a gridpoint xj along a border changes and V/ < for x^- a neighboring 

gridpoint across the border then 
21; Add the cell Ck containing onto L if not already there. 

22; Update the planned sweeping directions for Ck based on the location of the cell 

containing x{ (more about this later). 
23; end if 

24; Compute a value v for the neighbor cell Ck (more about this later) 

25; if V < V^"^^ then 

26; {Vk"^^) ^ V 

27; end if 

28; end for 
29; end while 



As an illustration, we consider another 5x5 checkerboard example (this time with a fast checker 
in the center) and show the contents of the heap in Figure 3.7 
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Here we take the cells coinciding with checkers; finer cell-decompositions are numerically tested 
in section |4.2[ The arrows indicate flagged sweeping directions for each cell, and the smaller font 
is used to show the current cell values. Similarly to Dijkstra's method and FMM, the heap data 
structure is implemented as an array; the bold numbers represent each cell's index in this array. In 
the beginning the central cell is the only one in L; once it is removed, it adds to L all four of its 
neighbors, all of them with the same cell value. Once the first of these (to the west of the center) 
is removed, it adds three more neighbor^ (but not the central cell since there are no characteristics 
passing through the former into the latter). This is similar to the execution path of FMSM, however, 
with heap-cell methods the cells may generally enter the heap more than once. Thus, additional 
errors introduced by FHCM are usually smaller than those in FMSM. 

Remark 3.7. To conclude the discussion of our heap-cell methods, we briefly describe a re- 
cent algorithm with many similar features, but very different goals and implementation details. The 
"Raster scan algorithm on a multi-chart geometry image" was introduced in [48j for geodesic dis- 
tance computations on parametric surfaces. Such surfaces are frequently represented by an atlas of 
overlapping charts, where each chart has its own parametric representation and grid resolution (de- 
pending on the detail level of the underlying surface) . The computational subdomains corresponding 
to charts are typically large and the "raster scan algorithm" (similar to the traditional FSM with a 
fixed ordering of sweep directions) is used to parallelize the computations within each chart. The 
heuristically defined chart values are employed to decide which chart will be raster-scanned next. 



^We note that the cell indexed 4 after the first removal is indexed 2 immediately after the second. This is the 
consequence of the performing removeAhesmallest using the down-heap procedure in the standard implementation of 
the heap; see |39| . 
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Aside from the difference in heuristic formulas used to compute chart values, in |48j the emphasis 
is on providing the most efficient implementation of raster scans on each chart (particularly for 
massively parallel architectures). The use of several large, parametrization/resolution-defined charts, 
typically results in complicated chart interdependencies since most chart boundaries are generally 
both inflow and outflow. Moreover, if this method is applied to any Eikonal problems beyond the 
geodesic distance computations, the monotonicity of characteristic directions will generally not hold 
and a high number of sweeps may be needed on each chart. In contrast, our focus is on reducing 
the cell interdependencies and on the most efficient cell ordering: when h'^ is sufficiently small, most 
cell boundaries are either completely inflow or outflow, defining a causal relationship among the 
cells. Relatively small cell sizes also ensure that F is approximately constant, the characteristics are 
approximately straight lines, and only a small number of sweeps is needed on each cell. Finally, the 
cell orderings are also useful to accelerate the convergence within each cell by altering the sweep- 
ordering based on the location of upwinding cells (as in FMSM and HCM) or based on fine-grid cell- 
boundary data (as in FHCM). The hybrid methods introduced here show that causality-respecting 
domain decompositions can accelerate even serial algorithms on single processor machines. 

4. Numerical Experiments. All examples were computed on a unit square [0, 1] x [0, 1] do- 
main with zero boundary conditions q — on the exit set Q (defined separately in each case). In 
each example that follows we have fixed the grid size h = , and only the cell size /i^ is varied. Since 
analytic formulas for viscosity solutions are typically unavailable, we have used the Fast Marching 
Method on a much finer grid (of size h/A) to obtain the "ground truth" used to evaluate the errors 
in all the other methods. 

Suppose Ci is the absolute value of the error-due-to-discretization at gridpoint Xi (i.e., the error 
produced by FSM or FMM when directly executed on the fine grid), and suppose Ei is the absolute 
value of the error committed by one of the new hybrid methods at the same ajj. Define the set 

— {xi £ X \ Ci 0} and let — \X^\ be the number of elements in it. (We verified that 
Xi ^ =^ Ei = in all computational experiments.) To analyze the "additional errors" introduced 
by FMSM and FHCM, we report 

• the Maximum Error Ratio defined asTZ = niaxi{Ei / ei) , where the maximum is taken over Xi G 

• the Average Error Ratio defined as p = ^^'m/^'"* ' '^^lere the sum is taken over Xi G X+ ; 

• the Ratio of Maximum Errors defined as i? = (-^' ) ^ 

R is relevant since on parts of the domain where e^'s are very small, additional errors might result in 
large TZ even if Ei^s are quite small compared to the Loo norm of discretization errors. In the ideal 
scenario, with no additional errors, TZ = p = R = I. 
For the Heap-Cell algorithms we also report 

• AvHR, the average number of heap removals per cell, 

• AvS, the average number of sweeps per cell, and 

• Mon %, the percentage of times that the "cell-boundary monotonicity" check was successful. 
Finally, we report the number of sweeps needed in FSM and LSM for each problem. 

Remark 4.1. Performance analysis of competing numerical methods is an obviously delicate 
undertaking since the implementation details as well as the choice of test problems might affect 
the outcome. We have made every effort to select representative examples highlighting advantages 
and disadvantages of all approaches. All tests were performed on an AMD Turion 2GHz dual-core 
processor with 3GB RAM. Only one core was used to perform all tests. Our C-|— I- implementations 
were carefully checked for the efficiency of data structures and algorithms, but we did not conduct 
any additional performance tuning or Assembly-level optimizations. Our code was compiled using 
the g++ compiler version 3.4.2 with compiler options -DO -f inline. We have also preformed all 
tests with the full compiler optimization (i.e., with -03); the results were qualitatively similar, but 
we opted to report the performance data for the unoptimized version to make the comparison as 
compiler-independent as possible. For each method, all memory allocations (for grids and heap data 
structures) were not timed; the reported CPU times include the time needed to initialize the relevant 
data structures and run the corresponding algorithm. We also note that the speed function F{x) 
was computed by a separate function call whenever needed, rather than precomputed and stored for 
every gridpoint during initialization. All CPU-times are reported in seconds for the Fast Marching 
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(FMM) , the standard Fast Sweeping (FSM) , the Locking Sweeping (LSM) , and the three new hybrid 
methods (HCM, FHCM, and FMSM). 



4.1. Comb Mazes. The following examples model optimal motion through a maze with slowly 
permeable barriers. Speed function F{x,y) is defined by a "comb maze": F — 1 outside and 
0.01 inside the barriers; see Figure 4.1 The exit set consists of the origin: Q = {(0,0)}. The 
computational cost of sweeping methods is roughly proportional to the number of barriers, while 
FMM is only minimally influenced by this. The same good property is inherited by the hybrid 
methods introduced in this paper. The first example with 4 barriers uses barrier walls aligned with 
cell boundaries and all hybrid methods easily outperform the fastest of the previous methods (LSM) ; 
see Table mH 




I 



i 




I 



A B 

Fig. 4.1. Min time to the point (0,0) on comb maze domains: 4 barriers (A), and 8 barriers (B). 



Table 4.1 

Performance/convergence results for a 4 wall comb maze example. 



Grid Size 


Loo Error 


ii Error 


FMM Time 


FSM Time 


LSM Time 


# Sweeps 


1408 X 1408 


5.9449C-002 


1.4210e-002 


2.45 


6.41 


2.05 


12 



METHOD 


TIME 


■R. 


P 


R 


AvHR 


AvS 


Mon % 


HCM 22 X 22 cells 


1.08 








1.151 


3.971 




HCM 44 X 44 cells 


1.10 








1.078 


3.724 




HCM 88 X 88 cells 


1.08 








1.040 


3.593 




HCM 176 X 176 cells 


1.10 








1.020 


3.518 




HCM 352 X 352 cells 


1.24 








1.015 


3.496 




HCM 704 X 704 cells 


1.63 








1.008 


3.468 




FHCM 22 X 22 cells 


0.79 


1.0460 


1.0000 


1.0000 


1.151 


1.618 


85.5 


FHCM 44 X 44 cells 


0.74 


1.0191 


1.0000 


1.0000 


1.078 


1.310 


92.6 


FHCM 88 X 88 cells 


0.74 


1.0085 


1.0000 


1.0000 


1.040 


1.156 


96.2 


FHCM 176 X 176 cells 


0.78 


1.0073 


1.0000 


1.0000 


1.020 


1.080 


98.4 


FHCM 352 X 352 cells 


0.95 


1.0002 


1.0000 


1.0000 


1.015 


1.049 


99.3 


FHCM 704 X 704 cells 


1.41 


1.0000 


1.0000 


1.0000 


1.008 


1.022 


100.0 


FMSM 22 X 22 cells 


0.58 


1.1659 


1.0000 


1.0000 




1.436 




FMSM 44 X 44 cells 


0.54 


1.0706 


1.0000 


1.0018 




1.218 




FMSM 88 X 88 cells 


0.53 


1.0821 


1.0000 


1.0018 




1.110 




FMSM 176 X 176 cells 


0.57 


1.0468 


1.0000 


1.0008 




1.055 




FMSM 352 X 352 cells 


0.71 


1.0378 


1.0000 


1.0004 




1.028 




FMSM 704 X 704 cells 


1.24 


1.0064 


1.0000 


1.0001 




1.014 





We note that even the slowest of the HCM trials outperforms FMM, FSM, and LSM on this 
example. Despite the special alignment of cell boundaries, this example is typical in the following 
ways: 
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1. In both Heap-Cell algorithms, as the number of cells increases, the average number of heap 
removals per cell decreases. 

2. In FHCM the average number of sweeps per cell decreases to 1 as h'^ decreases. 

3. In FHCM the percentage of monotonicity check successes increases as /i^ decreases. 

4. For timing performance in both HCM and FHCM, the optimal choice of h"^ is somewhere in 
the middle of the tested range. 

The reason for #2 is that, as the number of cells J increases, most cells will pass the Monotonicity 
Check. When the monotonicity percentage is high and each cell has on average 2 "upwinding" 
neighboring cells, each cell on the heap will have one sweeping direction tagged. This observation 
combined with #1 explains #2. 

Combining #1 and #2 and the fact that the length of the heap also increases with J there 
is a complexity trade-off that explains #4. As J tends to M, the complexity of both Heap-Cell 
algorithms is similar to that of Fast Marching. As J tends to 1, the complexity of HCM is similar 
to that of Locking Sweeping. 

In the second example we use 8 barriers and the boundaries of the cells are not aligned with 
the discontinuities of the speed function. This example was chosen specifically because it is difficult 
for our new hybrid methods when using the same cell-decompositions as in the previous example. 
The performance data is summarized in Table |4.2[ 

Table 4.2 

Performance/convergence results for an 8 wall comb maze example. 



Grid Size 


Loo Error 


ii Error 


FMM Time 


FSM Time 


LSM Time 


# Sweeps 


1408 X 1408 


6.5644C-002 


1.6865C-002 


2.50 


11.1 


3.20 
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METHOD 


TIME 




P 


R 


AvHR 


AvS 


Mon % 


HCM 22 X 22 cells 


2.13 








2.795 


9.293 




HCM 44 X 44 cells 


7.68 








8.738 


28.046 




HCM 88 X 88 cells 


6.68 








6.798 


22.804 




HCM 176 X 176 cells 


5.86 








5.655 


18.872 




HCM 352 X 352 cells 


2.95 








2.456 


8.314 




HCM 704 X 704 cells 


1.74 








1.037 


3.587 




FHCM 22 X 22 cells 


1.75 


1.4247 


1.0000 


1.0000 


2.946 


4.087 


84.7 


FHCM 44 X 44 cells 


5.86 


1.4250 


1.0000 


1.0000 


8.991 


10.209 


94.0 


FHCM 88 X 88 cells 


4.54 


1.3083 


1.0000 


1.0000 


6.976 


7.329 


98.1 


FHCM 176 X 176 cells 


3.96 


1.2633 


1.0000 


1.0000 


5.754 


5.910 


99.1 


FHCM 352 X 352 cells 


2.13 


1.8922 


1.0000 


1.0000 


2.468 


2.549 


99.1 


FHCM 704 X 704 cells 


1.48 


1.5700 


1.0000 


1.0000 


1.037 


1.066 


100.0 


FMSM 22 X 22 cells 


0.68 


604.49 


6.6555 


21.036 




1.783 




FMSM 44 X 44 cells 


0.59 


228.29 


3.1529 


19.442 




1.385 




FMSM 88 X 88 cells 


0.56 


313.01 


2.7666 


6.4608 




1.195 




FMSM 176 X 176 cells 


0.58 


381.98 


1.7374 


5.5944 




1.097 




FMSM 352 X 352 cells 


0.74 


45.397 


1.1718 


2.0506 




1.049 




FMSM 704 X 704 cells 


1.26 


23.303 


1.1738 


1.3536 




1.024 





Notice that since the edges of cells do not coincide with the edges of barriers, the performance of 
the hybrid methods is not as good as in the previous 4-barrier case, where the edges do coincide. In 
this example the cells that contain a discontinuity of the speed function may not receive an accurate 
cell value (for either the Heap-Cell algorithms or FMSM) and may often have poor choices of planned 
sweeping directions (for FHCM & FMSM). For FHCM, since the error is small in most trials, this 
effect appears to be rectified at the expense of the same cells being added to the heap many times. 
For FMSM, since each cell is processed only once, large error remains. The non-monotonic behavior 
of TZ in FMSM and FHCM appears to be due to changes in positions of cell centers relative to barrier 
edges as h'^ decreases. 

These comb maze examples illustrate the importance of choosing cell placement and cell sizes so 
that the speed is roughly constant in each cell. This is necessary both for a small number of sweeps 
to be effective and for choosing cell values accurately. 
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4.2. Checkerboards. We return to the checkerboard example aheady described in section [O] 
For both the 11x11 and 41 x 41 checkerboard speed functions the center checker is slow. The speed 
is 1 in the slow checkers and 2 in the fast checkers. The exit set is the single point Q = {(0.5, 0.5)}. 




■ 



II 



■I 



■I 



A 



B 



Fig. 4.2. Min time to the center on checkerboard domains: 11 X 11 checkers (A), and 41 X 41 checkers (B). 

Remark 4.2. Such checkerboard examples arise naturally in the context of front propagation 
through composite medium, consisting of a periodic mix of isotropic constituent materials with 
different speed function F . The idea of homogenization is to derive a homogeneous but anisotropic 
speed function F{n)^ describing the large-scale properties of the composite material. After F{n) is 
computed, the boundary value problems can be solved on a coarser grid. A new efficient method for 
this homogenization was introduced in , using FMM on the fine scale grid since the characteristics 
are highly oscillatory and the original implementation of sweeping was inefficient. The same test 
problems were later attacked in [3T] using a version of FSM with gridpoint locking (see Remark 2.4 1. 



The results in Table 4.4 shows that even the Locking- Sweeping Method becomes significantly less 
efficient than FMM with the increase in the number of checkers. 

Table 4.3 

Performance/convergence results for 11 X 11 checkerboard example. 



Grid Size 


Lao Error 


L± Error 


FMM Time 


FSM Time 


LSM Time 


# Sweeps 


1408 X 1408 


3.2639e-003 


1.7738e-003 


3.44 


12.3 


2.28 
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METHOD 


TIME 




P 


R 


AvHR 


AvS 


Mon % 


HCM 22 X 22 cells 


1.84 








1.397 


5.254 




HCM 44 X 44 cells 


1.73 








1.209 


4.613 




HCM 88 X 88 cells 


1.69 








1.083 


4.117 




HCM 176 X 176 colls 


1.72 








1.029 


3.864 




HCM 352 X 352 cells 


1.87 








1.009 


3.768 




HCM 704 X 704 cells 


2.51 








1.003 


3.746 




FHCM 22 X 22 cells 


1.17 


1.0122 


1.0000 


1.0000 


1.399 


1.779 


86.3 


FHCM 44 X 44 cells 


1.11 


1.0208 


1.0000 


1.0000 


1.227 


1.535 


90.6 


FHCM 88 X 88 cells 


1.08 


1.0111 


1.0000 


1.0000 


1.091 


1.247 


95.1 


FHCM 176 X 176 cells 


1.14 


1.0050 


1.0000 


1.0000 


1.029 


1.103 


97.8 


FHCM 352 X 352 cells 


1.33 


1.0006 


1.0000 


1.0000 


1.009 


1.043 


99.4 


FHCM 704 X 704 cells 


2.08 


1.0000 


1.0000 


1.0000 


1.003 


1.020 


100.0 


FMSM 22 X 22 cells 


0.87 


40.312 


1.5725 


13.016 




1.269 




FMSM 44 X 44 cells 


0.91 


18.167 


1.0875 


7.4581 




1.334 




FMSM 88 X 88 cells 


0.89 


7.6692 


1.0113 


3.1400 




1.222 




FMSM 176 X 176 cells 


0.91 


5.4947 


1.0025 


2.4813 




1.127 




FMSM 352 X 352 cells 


1.07 


2.4557 


1.0004 


1.3888 




1.067 




FMSM 704 X 704 cells 


1.84 


1.5267 


1.0000 


1.0032 




1.035 





In both examples the cell sizes were chosen to align with the edges of the checkers (i.e., the 
discontinuities of the speed function). On the 11 x 11 checkerboard, almost all of the HCM trials 



Table 4.4 

Performance/convergence results for 41 X 41 checkerboard example. 



Grid Size 


Lao Error 


L± Error 


FMM Time 


FSM Time 


LSM Time 


# Sweeps 


1312 X 1312 


1.2452e-002 


6.6827C-003 


4.13 


58.9 


11.7 


45 



METHOD 


TIME 




P 


R 


AvHR 


AvS 


Mon % 


HCM 41 X 41 cells 


4.18 








3.261 


11.926 




HCM 82 X 82 cells 


3.05 








1.571 


5.939 




HCM 164 X 164 cells 


2.84 








1.314 


4.831 




HCM 328 X 328 cells 


2.81 








1.080 


3.972 




HCM 656 X 656 cells 


3.36 








1.026 


3.768 




FHCM 41 X 41 cells 


2.83 


1.7506 


1.0041 


1.7123 


3.261 


4.600 


75.5 


FHCM 82 X 82 cells 


2.09 


1.0299 


1.0006 


1.0128 


1.584 


2.147 


78.8 


FHCM 164 X 164 cells 


1.95 


1.0103 


1.0001 


1.0000 


1.321 


1.670 


90.4 


FHCM 328 X 328 cells 


2.01 


1.0173 


1.0000 


1.0000 


1.080 


1.236 


96.9 


FHCM 656 X 656 cells 


2.79 


1.0075 


1.0000 


1.0000 


1.026 


1.106 


100.0 


FMSM 41 X 41 cells 


1.46 


12.398 


3.4110 


3.3991 




1.164 




FMSM 82 X 82 cells 


1.54 


10.551 


1.0975 


1.7662 




1.211 




FMSM 164 X 164 cells 


1.70 


4.7036 


1.0142 


1.7123 




1.281 




FMSM 328 X 328 cells 


1.88 


2.0192 


1.0020 


1.7123 




1.242 




FMSM 656 X 656 cells 


2.65 


1.7506 


1.0004 


1.7123 




1.147 





outperforms FMM and LSM, and most of the FHCM trials are more than twice as fast as LSM and 



three times faster than FMM while the additional errors are negligible; see Table 4.3 



The 41 X 41 example is much more difficult for the sweeping algorithms because the number of 
times the characteristics changes direction increases with the number of checkers. We note that the 
performance of FMM is only moderately worse here (mostly due to a larger length of level curves 
and the resulting growth of the "Considered List" ) . Again, almost all hybrid methods outperform 
all other methods. The difference is less striking than in the 11 x 11 example when compared with 
FMM, but FHCM and FMSM are 4 to 6 times faster than LSM; see Table Ol 
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4.3. Continuous speed functions with a point source. Suppose the speed function is 
F = 1 and the exit set consists of a single point Q = {(0.5, 0.5)}. In this case the viscosity solution 
is simply the distance to the center of the unit square. We also note that the causal ordering of 
cells is clearly available here; as a result, FHCM and FMSM do not introduce any additional errors. 
The performance data is summarized in Table |4.5[ For constant speed functions LSM performs 
significantly better than FMM on fine meshes (such as this one). The reason why FMSM and 
FHCM are faster than LSM in some trials is that LSM checks all parts of the domain in each sweep, 
including non-downwinding or already-computed parts. Additionally LSM must perform a final 
sweep to check that all gridpoints are locked. All of the hybrid algorithms slow down monotonically 
as J increases because of the cost of sorting the heap. 

Table 4.5 

Performance/convergence results for constant speed function. 



Grid Size 


Loo Error 


L± Error 


FMM Time 


FSM Time 


LSM Time 


# Sweeps 


1408 X 1408 


1.0956C-003 


6.8382C-004 


2.72 


2.07 


0.83 


5 



METHOD 


TIME 




p 


R 


AvHR 


AvS 


Mon % 


HCM 22 X 22 cells 


1.05 








1.000 


3.692 




HCM 44 X 44 cells 


1.12 








1.000 


3.718 




HCM 88 X 88 cells 


1.10 








1.000 


3.733 




HCM 176 X 176 cells 


1.14 








1.000 


3.742 




HCM 352 X 352 cells 


1.29 








1.000 


3.746 




HCM 704 X 704 cells 


1.76 








1.000 


3.748 




FHCM 22 X 22 cells 


0.66 


1.0000 


1.0000 


1.0000 


1.000 


1.025 


100.0 


FHCM 44 X 44 cells 


0.67 


1.0000 


1.0000 


1.0000 


1.000 


1.006 


100.0 


FHCM 88 X 88 cells 


0.69 


1.0000 


1.0000 


1.0000 


1.000 


1.002 


100.0 


FHCM 176 X 176 cells 


0.75 


1.0000 


1.0000 


1.0000 


1.000 


1.000 


100.0 


FHCM 352 X 352 cells 


0.92 


1.0000 


1.0000 


1.0000 


1.000 


1.000 


100.0 


FHCM 704 X 704 cells 


1.47 


1.0000 


1.0000 


1.0000 


1.000 


1.000 


100.0 


FMSM 22 X 22 cells 


0.47 


1.0000 


1.0000 


1.0000 




1.103 




FMSM 44 X 44 cells 


0.47 


1.0000 


1.0000 


1.0000 




1.049 




FMSM 88 X 88 cells 


0.49 


1.0000 


1.0000 


1.0000 




1.024 




FMSM 176 X 176 cells 


0.53 


1.0000 


1.0000 


1.0000 




1.012 




FMSM 352 X 352 cells 


0.67 


1.0000 


1.0000 


1.0000 




1.006 




FMSM 704 X 704 cells 


1.23 


1.0000 


1.0000 


1.0000 




1.003 





Next we consider examples of min-time to the center under two different oscillatory continu- 
ous speed functions. For F{x,y) = 1-1- ^ sin(207ra;) sin(207ry) the level sets of the value function 



are shown in Figure and the performance data is summarized in Table 4.6 For F(x,y) 



1 + 0.99 sin(27rx) sin(27ry) the level sets of the value function are shown in Figure |4.3|3 and the 



performance data is summarized in Table 4.7 



Note that HCM outperforms Fast Marching on all trials, and outperforms the sweeping meth- 
ods significantly on the first example (Table |4.6[ ) despite the fact that no special selection of cell 
boundaries was made. Small changes in the frequency of the speed function did not significantly 



alter the performance of the hybrid algorithms. In the second example (Table 4.7) most HCM trials 
were again faster than LSM and FMM. Note that for some cell sizes, both FMSM and FHCM have 
i? <C 7?. = maxj(i?j/ej). Whenever R is close to 1, the rate of convergence of hybrid methods (based 
on Loo errors) is the same as that of FMM and FSM. 

4.4. Performance on coarser grids. Our hybrid methods exploit the fact that there exists 
h'^ small enough so that most cell-boundaries will be either fully inflow or fully outflow and most 
pairs of cells will not be mutually dependent. But if the original grid X is sufficiently coarse, this 
may not be possible to achieve since we also need h'^ > 2h (otherwise FMM is clearly more efficient). 
In this subsection we return to some of the previous examples but on significantly coarser grids, to 
test whether the hybrid methods remain competitive with FMM and LSM. The performance data 
is summarized in Tables [4!8]|4.11[ to improve the accuracy of timing on coarser grids, all CPU times 
are reported for 20 executions of each algorithm. 
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Fig. 4.3. Min time to the center under sinusoidal speed functions. 
Table 4.6 

Performance/ convergence results for F{x,y) = 1 + 5 sin(207r2;) sin(207rj/). 



Grid Size 


Loo Error 


Li Error 


FMM Time 


FSM Time 


LSM Time 


# Sweeps 


1408 X 1408 


4.7569C-003 


1.9724C-003 


3.74 


23.7 


6.39 
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METHOD 


TIME 




P 


R 


AvHR 


AvS 


Mon % 


HCM 22 X 22 cells 


3.61 








1.913 


10.785 




HCM 44 X 44 cells 


2.97 








1.446 


6.811 




HCM 88 X 88 cells 


2.60 








1.245 


5.201 




HCM 176 X 176 cells 


2.40 








1.117 


4.350 




HCM 352 X 352 cells 


2.40 








1.047 


3.945 




HCM 704 X 704 cells 


2.92 








1.016 


3.788 




FHCM 22 X 22 cells 


2.72 


5.6062 


1.1358 


2.0960 


4.413 


5.310 


67.3 


FHCM 44 X 44 cells 


1.82 


3.1094 


1.1480 


1.0000 


1.555 


2.132 


78.7 


FHCM 88 X 88 cells 


1.61 


1.4025 


1.0122 


1.0000 


1.277 


1.575 


88.2 


FHCM 176 X 176 cells 


1.53 


1.0560 


1.0022 


1.0000 


1.125 


1.262 


94.5 


FHCM 352 X 352 cells 


1.65 


1.0226 


1.0004 


1.0000 


1.048 


1.106 


98.1 


FHCM 704 X 704 cells 


2.40 


1.0037 


1.0001 


1.0000 


1.016 


1.035 


100.0 


FMSM 22 X 22 cells 


1.14 


10.497 


2.4811 


2.9653 




1.262 




FMSM 44 X 44 cells 


1.10 


6.0892 


1.3657 


2.2889 




1.200 




FMSM 88 X 88 cells 


1.16 


4.6801 


1.0515 


1.9504 




1.213 




FMSM 176 X 176 cells 


1.18 


3.4828 


1.0074 


1.3705 




1.126 




FMSM 352 X 352 cells 


1.34 


1.5987 


1.0007 


1.0000 




1.067 




FMSM 704 X 704 cells 


2.14 


1.1262 


1.0001 


1.0000 




1.035 





Since M is much smaller here, the logM term in the complexity of Fast Marching plays less 
of a role. On most of the examples in this subsection, HCM and FHCM are not much faster than 
Fast Marching or Locking Sweeping. For example, in Table [4!9| even though the cell boundaries are 
perfectly aligned with the checker boundaries, both Heap-Cell methods are merely on par with Fast 
Marching. Note that when h is sufficiently small, their advantage over FMM and LSM is clear (see 
Table |4.4[ ). FMSM, however, is about twice as fast as the faster of FMM and LSM. In addition, 
FMSM's error ratios {R, TZ, and p) are smaller here than for the same examples on finer grids in 



Remark 4.3. Since two of the hybrid methods introduce additional errors, an important 
question is, "Given the total errors resulting from FHCM and FMSM at a given resolution {h, h'^), 
for which h > h would FMM commit similar errors, and how well would FMM perform on that 
new coarser grid?" For simplicity, assume in the following discussion that the CPU time required by 
FMM is roughly linear in M = 0{h~^) and that the resulting Loo error is 0{h). These are reasonable 
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Table 4.7 

Performance/convergence results for Fix, y) = 1 + 0.99 sin(27ra::) sin(27rj/) 



Grid Size 


Loo Error 


ii Error 


FMM Time 


FSM Time 


LSM Time 


# Sweeps 


1408 X 1408 


2.1793e-002 


9.8506e-004 


3.69 


12.7 


2.73 


13 



METHOD 


TIME 




P 


R 


AvHR 


AvS 


Mon % 


HCM 22 X 22 cells 


2.29 








1.165 


4.651 




HCM 44 X 44 cells 


2.15 








1.070 


4.132 




HCM 88 X 88 cells 


2.11 








1.034 


3.920 




HCM 176 X 176 cells 


2.13 








1.015 


3.811 




HCM 352 X 352 cells 


2.26 








1.008 


3.763 




HCM 704 X 704 cells 


2.80 








1.002 


3.741 




FHCM 22 X 22 cells 


1.37 


60.848 


1.0020 


1.0014 


1.174 


1.409 


92.7 


FHCM 44 X 44 cells 


1.28 


4.5786 


1.0002 


1.0001 


1.078 


1.185 


96.1 


FHCM 88 X 88 cells 


1.28 


1.0224 


1.0000 


1.0000 


1.039 


1.086 


98.2 


FHCM 176 X 176 cells 


1.35 


1.0019 


1.0000 


1.0000 


1.017 


1.039 


99.3 


FHCM 352 X 352 cells 


1.55 


1.0003 


1.0000 


1.0000 


1.008 


1.018 


99.7 


FHCM 704 X 704 cells 


2.27 


1.0001 


1.0000 


1.0000 


1.002 


1.006 


100.0 


FMSM 22 X 22 cells 


1.13 


1362.4 


1.0270 


1.0053 




1.231 




FMSM 44 X 44 cells 


1.06 


174.62 


1.0054 


1.0053 




1.116 




FMSM 88 X 88 cells 


1.05 


38.545 


1.0021 


1.0046 




1.057 




FMSM 176 X 176 cells 


1.09 


7.1581 


1.0006 


1.0046 




1.029 




FMSM 352 X 352 cells 


1.28 


1.1687 


1.0001 


1.0028 




1.014 




FMSM 704 X 704 cells 


2.08 


1.0724 


1.0000 


1.0000 




1.007 





assumptions for coarse grids; e.g., see Tables [478]|4.11[ For example, if we want to decrease time by 
a factor of p^, then M — )■ M/p^ , h —?' p * h (in 2-d), and errors would increase by a factor of p. 
Such estimates allow for a more accurate performance comparison between FMM and FMSM (or 
FHCM) based on the ratio R. Dividing the reported FMM time by the value R^, we will arrive 
at an estimate for the new FMM time computed on a coarser /i-grid with errors similar to those 
committed by FMSM on an (/i, ft, -grid. 

Among Tables 4.8|4.11[ the overall worst-case scen ario for FMSM under this analysis is the 



11 X 11 checkerboard example. Using the data in Table 4.8 with M = 176 and comparing FMM 



with FMSM at 22^, 44^, and 88^ ceUs, the new estimated FMM times would be 0.82/(2.3922) = .343, 
0.82/(1.34892) = .608, and 0.82/(1.0042) = .817. Comparing this to 0.29, 0.35, 0.53 reported for 
FMSM, we see that each of the cell trials still outperforms the corresponding improved time of FMM. 
Similar conclusions are reached when this analysis is performed using error ratios in Li norms. 
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Table 4.8 

Performance/convergence results for 20 trials of 11 x 11 checkerboard example on a coarse grid. 



Grid Size 


£/oo Error 


Z/i Error 


FMM Time 


FSM Time 


LSM Time 


# Sweeps 


176 X 176 


2.0986e-002 


1.1087e-002 


0.82 


3.91 


0.81 


16 



METHOD 


TIME 


TZ. 


P 


R 


AvHR 


AvS 


Mon % 


HCM 22 X 22 cells 


0.59 








1.438 


5.134 




HCM 41 X 44 colk 


0.59 








1.171 


1.199 




HCAl 88 X 88 colls 


0.72 








1.011 


:-!.779 




FHCM 22 X 22 cells 


0.41 


1.0017 


1.0000 


1.0000 


1.440 


1.804 


88.2 


FHCM 44 X 44 cells 


0.43 


1.0015 


1.0000 


1.0000 


1.171 


1.374 


97.0 


FHCM 88 X 88 cells 


0.59 


1.0000 


1.0000 


1.0000 


1.041 


1.158 


100.0 


FMSM 22 X 22 cells 


0.29 


5.1670 


1.0770 


2.3920 




1.269 




FMSM 44 X 44 cells 


0.35 


2.2742 


1.0066 


1.3489 




1.334 




FMSM 88 X 88 cells 


0.53 


1.2309 


1.0004 


1.0040 




1.221 





Grid Size 


Lao Error 


L\ Error 


FMM Time 


FSM Time 


LSM Time 


Sweeps 


352 X 352 


1.1470e-002 


6.0787e-003 


3.52 


15.4 


3.16 


16 



METHOD 


TIME 




P 


R 


AvHR 


AvS 


Mon % 


HCM 22 X 22 cells 


2.40 








1.438 


5.302 




HCM 44 X 44 cells 


2.25 








1.208 


4.465 




HCM 88 X 88 cells 


2.32 








1.059 


3.904 




HCM 176 X 176 cells 


2.91 








1.018 


3.757 




FHCM 22 X 22 cells 


1.61 


1.1194 


1.0002 


1.0725 


1.490 


1.936 


84.9 


FHCM 44 X 44 cells 


1.53 


1.0434 


1.0000 


1.0000 


1.228 


1.508 


92.2 


FHCM 88 X 88 colls 


1.69 


1.0745 


1.0000 


1.0000 


1.0.59 


1.190 


97.5 


FllCAl 175 X 175 colls 


2.10 


1.0273 


1.0000 


1.0000 


1.018 


1.085 


100.0 


FMSM 22 X 22 cells 


1.12 


10.551 


1.1593 


4.0315 




1.269 




FMSM 44 X 44 cells 


1.21 


4.7036 


1.0252 


3.9089 




1.334 




FMSM 88 X 88 cells 


1.38 


4.1945 


1.0093 


3.9089 




1.222 




FMSM 176 X 176 cells 


2.12 


4.1945 


1.0074 


3.9089 




1.127 





Table 4.9 

Performance/convergence results for 20 trials of 41 x 41 checkerboard on a coarse grid. 



Grid Size 


Loo Error 


L\ Error 


FMM Time 


FSM Time 


LSM Time 


# Sweeps 


164 X 161 


7.1112O-002 


3.8397O-002 


1.08 


17.9 


4.01 


44 



METHOD 


TIME 


-R. 


P 


R 


AvHR 


AvS 


Mon % 


HCM 41 X 41 colls 


1.13 








2.204 


7.041 




HCM 82 X 82 colls 


1.05 








1.261 


4.215 




FHCM 41 X 41 colls 


0.85 


1.0000 


1.0000 


1.0000 


2.204 


2.449 


92.2 


FHCM 82 X 82 cells 


0.90 


1.0000 


1.0000 


1.0000 


1.261 


1.474 


100.0 


FMSM 41 X 41 cells 


0.53 


1.4878 


1.0850 


1.0197 




1.163 




FMSM 82 X 82 cells 


0.77 


1.1277 


1.0162 


1.0193 




1.210 





Grid Size 


Loo Error 


L\ Error 


FMM Time 


FSM Time 


LSM Time 


# Sweeps 


328 X 328 


4.0403e-002 


2.3205e-002 


4.44 


73.3 


16.6 


45 



METHOD 


TIME 


TZ. 


P 


R 


AvHR 


AvS 


Mon % 


HCM 41 X 41 cells 


5.42 








2.873 


9.970 




HCM 82 X 82 colls 


4.02 








1.500 


5.104 




HCM 164 X 164 colls 


4.19 








1.181 


4.105 




FHCM 41 X 41 colls 


3.65 


1.0988 


1.0008 


1.0679 


2.873 


3.802 


81.6 


FHCM 82 X 82 colls 


2.90 


1.0236 


1.0000 


1.0000 


1.501 


1.923 


88.0 


FllCAl Kil X Kil colls 


3.r,5 


1.0000 


1.0000 


1.0000 


1.181 


l.:-!8i 


100.0 


FMSM 41 X 41 cells 


1.88 


2.9459 


1.4364 


1.4668 




1.164 




FMSM 82 X 82 cells 


2.22 


2.3040 


1.0533 


1.1457 




1.211 




FMSM 164 X 164 cells 


3.27 


1.1540 


1.0009 


1.0679 




1.281 
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Table 4.10 

Performance/convergence results for 20 trials of F(x,y) = 1 + f sin(207ra;) sin(207r?/) on a coarse grid. 



Grid Size 


Loo Error 


Li Error 


FMM Time 


FSM Time 


LSM Time 


# Sweeps 


176 X 176 


3.6535e-002 


1.3374e-002 


0.94 


8.77 


3.08 


28 



METHOD 


TIME 




P 


R 


AvHR 


AvS 


Mon % 


HCM 22 X 22 cells 


0.97 








1.773 


8.233 




HCM 44 X 44 cells 


0.88 








1.280 


4.992 




HCM 88 X 88 cells 


0.87 








1.100 


3.975 




FHCM 22 X 22 cells 


0.66 


1.3736 


1.0209 


1.0000 


2.153 


2.814 


69.3 


FHCM 44 X 44 cells 


0.60 


1.1703 


1.0186 


1.0000 


1.285 


1.684 


87.7 


FHCM 88 X 88 cells 


0.71 


1.1170 


1.0072 


1.0000 


1.100 


1.234 


100.0 


FMSM 22 X 22 cells 


0.38 


7.0809 


1.2945 


1.0359 




1.244 




FMSM 44 X 44 cells 


0.42 


2.2023 


1.0402 


1.0100 




1.197 




FMSM 88 X 88 cells 


0.64 


1.0945 


1.0024 


1.0000 




1.213 





Grid Size 


Loo Error 


Li Error 


FMM Time 


FSM Time 


LSM Time 


# Sweeps 


352 X 352 


1.84146-002 


7.0584e-003 


3.92 


33.7 


11.1 
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METHOD 


TIME 


7i 


P 


R 


AvHR 


AvS 


Mon % 


HCM 22 X 22 cells 


4.43 








1.909 


9.864 




HCM 44 X 44 cells 


3.57 








1.403 


5.969 




HCM 88 X 88 cells 


3.18 








1.178 


4.493 




HCM 176 X 176 cells 


3.45 








1.060 


3.891 




FHCM 22 X 22 cells 


2.89 


1.8770 


1.0300 


1.0202 


2.905 


3.630 


66.2 


FHCM 44 X 44 cells 


2.29 


1.8064 


1.0712 


1.0000 


1.425 


1.918 


82.2 


FHCM 88 X 88 cells 


2.23 


1.2724 


1.0108 


1.0000 


1.182 


1.394 


93.7 


FHCM 176 X 176 cells 


2.84 


1.0500 


1.0016 


1.0000 


1.060 


1.130 


100.0 


FMSM 22 X 22 cells 


1.44 


4.3257 


1.4890 


1.1939 




1.246 




FMSM 44 X 44 cells 


1.46 


2.2958 


1.0975 


1.1932 




1.197 




FMSM 88 X 88 cells 


1.78 


1.7082 


1.0110 


1.0806 




1.213 




FMSM 176 X 176 cells 


2.57 


1.0845 


1.0010 


1.0000 




1.126 





Table 4.11 

Performance/convergence results for 20 trials F{x, y) = 1 + 0.99sin(27ra;) sin(27ry) on a coarse grid. 



Grid Size 


Loo Error 


il Error 


FMM Time 


FSM Time 


LSM Time 


# Sweeps 


176 X 176 


1.0533e-001 


5.6430e-003 


0.93 


4.00 


0.93 


13 



METHOD 


TIME 




P 


R 


AvHR 


AvS 


Mon % 


HCM 22 X 22 cells 


0.74 








1.165 


4.496 




HCM 44 X 44 cells 


0.73 








1.085 


4.040 




HCM 88 X 88 cells 


0.83 








1.026 


3.790 




FHCM 22 X 22 cells 


0.47 


1.0952 


1.0020 


1.0004 


1.169 


1.388 


94.2 


FHCM 44 X 44 cells 


0.50 


1.0200 


1.0005 


1.0000 


1.087 


1.173 


97.9 


FHCM 88 X 88 cells 


0.66 


1.0045 


1.0001 


1.0000 


1.027 


1.051 


100.0 


FMSM 22 X 22 cells 


0.37 


1.2819 


1.0044 


1.0164 




1.231 




FMSM 44 X 44 cells 


0.41 


1.1839 


1.0007 


1.0053 




1.116 




FMSM 88 X 88 cells 


0.59 


1.0979 


1.0001 


1.0000 




1.057 





Grid Size 


Loo Error 


L\ Error 


FMM Time 


FSM Time 


LSM Time 


# Sweeps 


352 X 352 


6.8813e-002 


3.1818e-003 


3.84 


15.9 


3.64 


13 



METHOD 


TIME 


TZ. 


P 


R 


AvHR 


AvS 


Mon % 


HCM 22 X 22 cells 


3.00 








1.178 


4.624 




HCM 44 X 44 cells 


2.76 








1.076 


4.082 




HCM 88 X 88 cells 


2.83 








1.033 


3.853 




HCM 176 X 176 cells 


3.29 








1.008 


3.747 




FHCM 22 X 22 cells 


1.82 


1.1364 


1.0040 


1.0004 


1.178 


1.405 


93.2 


FHCM 44 X 44 cells 


1.71 


1.0204 


1.0005 


1.0000 


1.080 


1.170 


97.7 


FHCM 88 X 88 cells 


1.98 


1.0034 


1.0001 


1.0000 


1.034 


1.071 


99.2 


FHCM 176 X 176 cells 


2.69 


1.0006 


1.0000 


1.0000 


1.008 


1.022 


100.0 


FMSM 22 X 22 cells 


1.44 


2.3482 


1.0080 


1.0074 




1.231 




FMSM 44 X 44 cells 


1.42 


1.5167 


1.0014 


1.0037 




1.116 




FMSM 88 X 88 cells 


1.61 


1.1989 


1.0004 


1.0034 




1.057 




FMSM 176 X 176 cells 


2.44 


1.0953 


1.0001 


1.0015 




1.028 
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Remark 4.4. We could perform a similar comparison between FMSM and sweeping methods, 
but the latter allow for yet another speed up technique: the sweeping can be stopped before the full 
convergence to the solution of system (2.1 1. In fact, in many implementations of Fast Sweeping, the 
method terminates when the changes in grid values due to the most recent sweep fall below some 
positive threshold t*; e.g., see j26]. Similarly to FHCM and FMSM, this results in additional errors, 
and it is useful to consider both these errors and the corresponding savings in computational time. To 
the best of our knowledge, this issue has not been analyzed so far. The practical implementations of 
FSM and LSM typically select t* heuristically or make it proportional to the grid-size h. It is usually 



claimed that the number of sweeps necessary for convergence is /i-independent [331 [5T]. Tables 4.9 



and 4.10 show that the number of sweeps-to-convergence (i.e., for t* = 0) may in fact depend on h. 
We believe that Figure 2.1 provides one possible explanation for this phenomenon (since the location 
of gridpoints relative to shocklines is /i-dependent). 
For t* > 0, the more relevant questions are: 

1. How well do the changes in the most recent sweep represent the additional errors, which 
would result if we were to stop the sweeping? 

2. Is the number of sweeps (needed for a fixed t* > 0) really /i- independent? 

3. Supposing the additional ("early-termination") errors could be estimated, would the number 
of required sweeps be /i-independent? 

4. Supposing FSM or LSM were run for as many sweeps as necessary to make the additional 
errors approximately the same as those introduced by FMSM or FHCM, would the resulting 
computational costs be less than those of hybrid methods? 

To answer these questions for one specific (41 x 41 checkerboard) example, we have run both sweeping 
methods on 164^ and 1312^ grids. In table 4.12 we report the ioo change in grid values, the percentage 
of gridpoints changing, and potential early-termination errors (7?., p, and R) after each sweep. At 
least for this particular example: 

1. The answer to Question 1 is inconclusive, though the max changes are clearly correlated 
with R and p. 

2. The answer to Question 2 is negative; moreover, after the same number of sweeps, the max 
changes on the 1312^ grid are clearly larger than on the 164^ grid. 

3. The answer to Question 3 is negative; e.g., R reduces below 1.1 after only 12 sweeps on the 
164^ grid, but the same reduction on the 1312^ grid requires 42 sweeps. 

4. To answer the last question, we note that for this example FHCM produces very small addi- 
tional errors, while FMSM results in R = 1.0197 a nd R = 1.0193 (on the 164^ grid with 21^ 
and 42^ cells, respectively; see Table 4.9). As Table 4.12 \ shows, 16 sweeps would be needed 
for FSM or LSM to produce the same R values on this grid. Our computational experiment 
shows that FSM and LSM times for these 16 sweeps are 6.62 and 2.91 seconds respectively 
(note that this is the total time for 20 trials, similar to the times reported in Table 4.9 1. 
Thus, FMSM is still more than 3.5 times faster than the early-terminated LSM and more 
than 8 times faster than the early-terminated FSM. For the 1312^ example, we see that 
the error ratios take longer to converge to 1 for the sweeping methods (Table 4.12| 3). The 
FMSM's R values of {3.3991, 1.7662, 1.712 3} (from Table |4l) for different ceh sizes) corre- 
spond to {28, 37, 37} sweeps in Table 4.12| 3. The experimentally measured early-terminated 
execution times for FSM and LSM are {36.85,48.77,48.77} seconds and {7.40, 11.68, 11.68} 
seconds respectively. Again, FMSM still holds a large advantage (more than 4 times faster 
than LSM and more than 18 times faster than FSM). We note that for both the 164^ and 
1312^ cases, the early-terminated FSM time was linear in the number of sweeps, while LSM 
did not receive as much of a speed boost; this is natural since the percentage of gridpoints 
changing in the omitted "later iterations" is low, and the LSM's computational cost is largely 
dependent on the number of unlocked gridpoints in each sweep. 
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Sweep 


Max 


% CPs 




P 


R 


# 


Change 


changing 








1 


l.OOc+008 


26.22 


- 


- 


- 


2 


l.OOOo+OOS 


31.856 


- 


- 


- 


3 


l.OOOc+008 


58.247 


44.595 


1.7709 


4.8179 


4 


2.7622C-001 


44.4527 


1.4685 


1.1027 


1.2445 


5 


6.3846e-003 


41.5341 


1.4224 


1.0888 


1.1995 


6 


5.9641e-003 


41.1957 


1.4195 


1.0759 


1.1995 


7 


5.9641e-003 


41.0730 


1.3832 


1.0631 


1.1951 


8 


5.4993e-003 


40.1919 


1.3331 


1.0509 


1.1562 


9 


4.9918C-003 


37.0650 


1.3243 


1.0440 


1.1205 


10 


4.9918C-003 


36.6337 


1.3230 


1.0377 


1.1205 


11 


4.9918e-003 


36.3995 


1.2881 


1.0314 


1.1191 


12 


4.7740C-003 


34.6743 


1.2492 


1.0255 


1.0854 


13 


4.5076C-003 


31.3318 


1.2403 


1.0218 


1.0532 


14 


4.5076C-003 


30.8150 


1.2400 


1.0185 


1.0520 


15 


4.5076C-003 


30.4767 


1.2098 


1.0152 


1.0511 


16 


4.1600C-003 


28.0934 


1.1820 


1.0121 


1.0270 


17 


3.6304e-003 


22.6502 


1.1646 


1.0102 


1.0004 


18 


3.6304C-003 


21.8062 


1.1644 


1.0085 


1.0000 


19 


3.6304C-003 


21.2374 


1.1467 


1.0068 


1.0000 


20 


3.2984C-003 


19.1404 


1.1268 


1.0052 


1.0000 


21 


2.7917e-003 


14.3367 


1.1079 


1.0043 


1.0000 


22 


2.7142C-003 


13.6340 


1.1079 


1.0035 


1.0000 


23 


2.7142e-003 


13.2250 


1.0951 


1.0027 


1.0000 


24 


2.4311C-003 


11.6746 


1.0812 


1.0020 


1.0000 


25 


2.1725C-003 


8.4659 


1.0637 


1.0015 


1.0000 


26 


1.8533C-003 


7.9677 


1.0630 


1.0012 


1.0000 


27 


1.8533C-003 


7.6852 


1.0546 


1.0009 


1.0000 


28 


1.7075C-003 


6.6664 


1.0457 


1.0006 


1.0000 


29 


1.5049C-003 


4.8000 


1.0365 


1.0004 


1.0000 


30 


1.1216e-003 


4.4653 


1.0303 


1.0003 


1.0000 


31 


1.1216C-003 


4.2646 


1.0257 


1.0002 


1.0000 


32 


1.0109C-003 


3.5656 


1.0209 


1.0001 


1.0000 


33 


8.5675e-004 


2.2754 


1.0153 


1.0001 


1.0000 


34 


4.8751e-004 


2.0300 


1.0110 


1.0001 


1.0000 


35 


4.8751e-004 


1.8813 


1.0087 


1.0000 


1.0000 


36 


4.2582e-004 


1.4314 


1.0068 


1.0000 


1.0000 


37 


3.4338e-004 


0.7064 


1.0043 


1.0000 


1.0000 


38 


1.1188C-004 


0.5689 


1.0025 


1.0000 


1.0000 


39 


1.1188C-004 


0.4871 


1.0015 


1.0000 


1.0000 


40 


8.9968e-005 


0.2863 


1.0011 


1.0000 


1.0000 


41 


6.8284e-005 


0.0632 


1.0006 


1.0000 


1.0000 


42 


2.4066C-005 


0.0297 


1.0002 


1.0000 


1.0000 


43 


1.0931C-005 


0.0112 


1.0000 


1.0000 


1.0000 


44 


O.OOOOc+000 


0.0000 


1.0000 


1.0000 


1.0000 



A 

Table 4 

Maximum change of V for the sweeping methods for the 
and 1312 X 1312 grid (B). 



Sweep 


Max 


% CPs 




P 


R 


# 


Change 


changing 








1 


I.O0+OO8 


25.2 


- 


- 


- 


2 


l.OOOe+008 


34.249 


- 


- 


- 


3 


l.OOOc+008 


62.372 


48.051 


9.2339 


30.026 


4 


3.621e-001 


49.221 


12.002 


4.1935 


7.7797 


5 


1.0709C-002 


43.0590 


11.194 


3.9168 


7.7098 


6 


1.0252C-002 


42.1586 


10.474 


3.6528 


7.2822 


7 


1.0252C-002 


42.0001 


10.269 


3.3925 


7.2771 


8 


1.0229C-002 


39.8694 


9.6885 


3.1431 


6.9538 


9 


1.0207C-002 


34.3652 


9.6713 


2.9458 


6.9386 


10 


1.0207C-002 


33.2951 


9.4074 


2.7562 


6.5653 


11 


1.0207C-002 


33.1453 


9.0914 


2.5686 


6.5623 


12 


1.0185C-002 


31.2973 


8.6218 


2.3892 


6.1648 


13 


1.0165C-002 


26.4975 


8.5771 


2.2483 


6.1607 


14 


1.0165C-002 


25.5465 


8.3781 


2.1135 


5.8497 


15 


1.0165C-002 


25.3961 


8.0972 


1.9804 


5.8487 


16 


1.0145C-002 


23.7761 


7.5600 


1.8536 


5.4607 


17 


1.0127C-002 


19.6460 


7.5550 


1.7561 


5.4571 


18 


1.0127C-002 


18.8095 


7.2488 


1.6635 


5.0667 


19 


1.0127C-002 


18.6819 


7.0133 


1.5722 


5.0658 


20 


1.0108C-002 


17.2760 


6.5857 


1.4861 


4.7569 


21 


1.0092C-002 


13.8028 


6.5691 


1.4221 


4.7449 


22 


1.0092C-002 


13.0992 


6.2438 


1.3619 


4.3682 


23 


1.0092C-002 


12.9746 


6.0465 


1.3027 


4.3674 


24 


1.0075C-002 


11.8224 


5.5031 


1.2475 


3.9749 


25 


1.0060C-002 


9.0073 


5.4990 


1.2088 


3.9720 


26 


1.0060C-002 


8.4400 


5.2424 


1.1730 


3.6712 


27 


1.0060C-002 


8.3202 


5.0816 


1.1380 


3.6705 


28 


1.0045C-002 


7.3932 


4.5433 


1.1059 


3.2817 


29 


1.0031C-002 


5.2311 


4.5402 


1.0854 


3.2794 


30 


1.0031C-002 


4.8007 


4.1140 


1.0670 


2.8875 


31 


1.0031C-002 


4.7054 


3.9971 


1.0491 


2.8871 


32 


1.0018C-002 


4.0108 


3.5893 


1.0334 


2.5926 


33 


1.0005C-002 


2.5072 


3.5711 


1.0249 


2.5795 


34 


1.0005C-002 


2.2144 


3.1264 


1.0177 


2.2008 


35 


1.0005C-002 


2.1276 


3.0465 


1.0109 


2.2005 


36 


9.9928C-003 


1.6782 


2.4976 


1.0053 


1.8040 


37 


9.9809C-003 


0.8296 


2.4942 


1.0034 


1.8016 


38 


9.9809C-003 


0.6789 


2.1526 


1.0020 


1.5223 


39 


9.9809C-003 


0.6047 


2.1076 


1.0009 


1.5223 


40 


9.9619C-003 


0.3888 


1.5638 


1.0002 


1.1295 


41 


5.0711C-003 


0.1151 


1.5638 


1.0001 


1.1295 


42 


4.9343C-003 


0.0698 


1.1631 


1.0000 


1.0000 


43 


1.4668C-003 


0.0338 


1.0000 


1.0000 


1.0000 


44 


O.OOOOc+000 


0.0000 


1.0000 


1.0000 


1.0000 



B 

.12 

41 X 41 checkerboard example on the 164 X 164 grid (A) 



4.5. Continuous speed functions with general boundary conditions. Next wc return to 
speed functions F{x, y) = 1 + 0.99 sin(27ra;) sm[2iTy) and F[x^ y) — 1 + 5 sin(207ra:) sin(207r?/), but this 
time with zero boundary conditions on the entire boundary of the square. The performance data is 
summarized in Tables |4. 131 and [4. 141 

Remark 4.5. Our current implementation of FMSM treats the coarse gridpoints nearest to 
the boundary as Accepted in the initialization. If there is more than one coarse gridpoint in the exit 
set, as in the following examples, care must be taken when ranking the "acceptance order" of these 
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coarse gridpoints. While in the case of single-point exit sets it is safe to assign a zero value to these 
coarse gridpoints, for general boundary conditions we compute the values by a one-sided update 
from the cell center to the nearest point on the boundary. In addition, our FMSM implementation 
iterates FSM to convergence on all cells containing parts of Q. 




Grid Size 


Loo Error 


ii Error 


FMM Time 


FSM Time 


LSM Time 


# Sweeps 


1408 X 1408 


1.3670C-003 


3.7171C-004 


3.89 


24.3 


6.62 


24 



METHOD 


TIME 




P 


R 


AvHR 


AvS 


Mon % 


HCM 22 X 22 cells 


3.48 








1.853 


10.273 




HCM 44 X 44 cells 


2.92 








1.470 


6.811 




HCM 88 X 88 cells 


2.53 








1.195 


4.987 




HCM 176 X 176 cells 


2.35 








1.098 


4.301 




HCM 352 X 352 cells 


2.37 








1.046 


3.951 




HCM 704 X 704 cells 


2.91 








1.018 


3.785 




FHCM 22 X 22 cells 


2.60 


20660 


1.5321 


3.2150 


2.915 


4.498 


54.5 


FHCM 44 X 44 cells 


1.95 


62.164 


1.2465 


1.5447 


1.539 


2.502 


68.0 


FHCM 88 X 88 cells 


1.66 


64.719 


1.0187 


1.0128 


1.223 


1.749 


83.9 


FHCM 176 X 176 cells 


1.55 


5.7122 


1.0032 


1.0063 


1.102 


1.361 


92.4 


FHCM 352 X 352 cells 


1.66 


1.1083 


1.0007 


1.0011 


1.047 


1.165 


97.5 


FHCM 704 X 704 cells 


2.40 


1.0192 


1.0001 


1.0001 


1.018 


1.064 


100.0 


FMSM 22 X 22 cells 


1.97 


1.6383e-(-5 


7.8665 


12.339 




2.184 




FMSM 44 X 44 cells 


1.67 


1.1325e-|-6 


2.6113 


4.2370 




1.892 




FMSM 88 X 88 cells 


1.42 


5506.21 


1.0388 


1.8072 




1.527 




FMSM 176 X 176 cells 


1.29 


859.45 


1.0044 


1.2609 




1.265 




FMSM 352 X 352 cells 


1.40 


253.58 


1.0009 


1.0270 




1.134 




FMSM 704 X 704 cells 


2.17 


6.6107 


1.0001 


1.0000 




1.062 





5. Conclusions. We have introduced three new efficient hybrid methods for Eikonal equations. 
Using a splitting of the domain into a number of cells (with the speed function approximately 
constant on each of them), our methods employ sweeping methods on individual cells with the order 
of cell-processing and the direction of sweeps determined by a marching-like procedure on a coarser 
scale. Such techniques may introduce additional errors to attain higher computational efficiency. Of 
these new methods FMSM is generally the fastest and somewhat easier to implement, while FHCM 
introduces smaller additional errors, and HCM is usually the slowest of the three but provably 
converges to the exact solutions. The numerical evidence presented in this paper strongly suggests 
that 

• when h and h'^ are sufficiently small, additional errors introduced by FMSM and FHCM are 
negligible compared to those already present due to discretization; 

• for the right (/i, /i'^)-combinations, our new hybrid algorithms significantly outperform the prior 
fast methods (FMM, FSM, and LSM). 
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Table 4.14 

Performance/convergence results for F{x,y) = 1 + 0.99 sin(27ri;) sin(27r2/) with Q = dQ. 



Grid Size 


Lao Error 


Li Error 


FMM Time 


FSM Time 


LSM Time 


# Sweeps 


1408 X 1408 


2.2246e-002 


2.7572e-004 


3.66 


8.06 


2.58 


8 



METHOD 


TIME 


71 


P 


R 


AvHR 


AvS 


Mon % 


HCM 22 X 22 cells 


2.03 








1.176 


4.448 




HCM 44 X 44 cells 


1.97 








1.089 


4.021 




HCM 88 X 88 cells 


1.93 








1.047 


3.830 




HCM 176 X 176 cells 


1.96 








1.020 


3.718 




HCM 352 X 352 cells 


2.10 








1.009 


3.670 




HCM 704 X 704 cells 


2.74 








1.006 


3.649 




FHCM 22 X 22 cells 


1.51 


136.37 


1.0001 


1.0000 


1.176 


1.903 


93.4 


FHCM 44 X 44 cells 


1.35 


2.4167 


1.0000 


1.0000 


1.091 


1.443 


99.0 


FHCM 88 X 88 cells 


1.32 


2.4167 


1.0000 


1.0000 


1.048 


1.226 


99.6 


FHCM 176 X 176 cells 


1.39 


1.6390 


1.0000 


1.0000 


1.020 


1.110 


99.8 


FHCM 352 X 352 cells 


1.57 


1.0000 


1.0000 


1.0000 


1.009 


1.054 


99.9 


FHCM 704 X 704 cells 


2.33 


1.0000 


1.0000 


1.0000 


1.006 


1.028 


100.0 


FMSM 22 X 22 cells 


1.57 


12592 


1.0441 


1.0000 




1.599 




FMSM 44 X 44 cells 


1.27 


355.53 


1.0088 


1.0000 




1.306 




FMSM 88 X 88 cells 


1.15 


355.53 


1.0055 


1.0000 




1.157 




FMSM 176 X 176 cells 


1.14 


303.61 


1.0030 


1.0000 




1.079 




FMSM 352 X 352 cells 


1.31 


134.60 


1.0012 


1.0000 




1.040 




FMSM 704 X 704 cells 


2.11 


68.199 


1.0004 


1.0000 




1.014 





Of course, the rate of change of the speed function F determines the suitable size of cells and our 
methods are particularly efficient for the examples where F is piecewise constant. 

All of the examples considered here used predetermined uniform cell-sizes. From a practitioner's 
point of view, the value of the proposed methods will greatly increase once we develop bounds and 
estimates for the additional errors in both FMSM and FHCM. Such estimates would be also very 
useful for the computational costs of all three hybrid methods on a given cell-decomposition. In the 
future, we intend to automate the choice of cell-sizes (based on the speed function and user-specified 
tolerances for additional errors) and further relax the requirement that all cells need to be uniform. 
A generalization of this approach to cell-subdivision of unstructured meshes will also be valuable. 

We expect the extensions of these techniques to higher dimensional problems to be useful for 
many applications and relatively straight-forward - especially for FMSM and HCM. A higher dimen- 
sional version of the "cell boundary monotonicity check" will be needed to extend FHCM. 

Other obvious directions for future work include extensions to higher-order accurate discretiza- 
tions and parallelizable cell-level numerical methods for Eikonal equations. 

More generally, we hope that the ideas presented here can serve as a basis for causal domain 
decomposition and efficient two-scale methods for other static nonlinear PDEs, including those arising 
in anisotropic optimal control and differential games. 
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